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Methods and compounds, including compositions 
therefrom, are provided for determining the sequence 
of nucleic acid molecules. The methods permit the 
determination of multiple nucleic acid sequences 
simultaneously. The compounds are used as tags to generate 
tagged nucleic acid fragments which are complementary 
to a selected target nucleic acid molecule. Each tag is 
correlative with a particular nucleotide and, in a preferred 
embodiment, is detectable by mass spectrometry. Following 
separation of the tagged fragments by sequential length, the 
tags are cleaved from the tagged fragments. In a preferred 
embodiment, the lags are detected by mass spectrometry 
and the sequence of the nucleic acid molecule is determined 
therefrom. The individual steps of the methods can be used 
in automated format, e.g., by the incorporation into systems. 
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Description 

METHODS AND COMPOSITIONS FOR DETERMINING 
THE SEQUENCE OF NUCLEIC ACID MOLECULES 

5 

TECHNICAL FIELD 

The present invention relates generally to methods and compositions for 
determining the sequence of nucleic acid molecules, and more specifically, to methods 
and compositions which allow the determination of multiple nucleic acid sequences 
1 0 simultaneously. 



15 



BACKGROUND OF THE INVENTION 

Deoxyribonucleic acid (DNA) sequencing is one of the basic techniques 
of biology. It is at the heart of molecular biology and plays a rapidly expanding role in 
the rest of biology. The Human Genome Project is a multi-national effort to read the 
entire human genetic code. It is the largest project ever undertaken in biology, and has 
already begun to have a major impact on medicine. The development of cheaper and 
faster sequencing technology will ensure the success of this project. Indeed, a 
substantial effort has been funded by the NIH and DOE branches of the Human 
20 Genome Project to improve sequencing technology, however, without a substantial 
impact on current practices (Sulston and Waterston, Nature 376: 1 75, 1 995). 

In the past two decades, determination and analysis of nucleic acid 
sequence has formed one of the building blocks of biological research. This, along with 
new investigational tools and methodologies, has allowed scientists to study genes and 
gene products in order to better understand the function of these genes, as well as to 
develop new therapeutics and diagnostics. 

Two different DNA sequencing methodologies that were developed in 
1977, are still in wide use today. Briefly, the enzymatic method described by Sanger 
(Proc. Natl. Acad Sci. (USA) 74:5463, 1977) which utilizes dideoxy-terminators, 
involves the synthesis of a DNA strand from a single-stranded template by a DNA 
polymerase. The Sanger method of sequencing depends on the fact that that 
dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way 
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as normal deoxynucleotides (albeit at a lower efficiency). However. ddNTPs differ 
from normal deoxynucleotides (dNTPs) in that they lack the 3'-OH group necessary for 
chain elongation. When a ddNTP is incorporated into the DNA chain, the absence of 
the 3'-hydroxy group prevents the formation of a new phosphodiester bond and the 
5 DNA fragment is terminated with the ddNTP complementary to the base in the template 
DNA. The Maxam and Gilbert method (Maxam and Gilbert, Proc. Natl. Acad. Sci. 
(USA) 74:560, 1977) employs a chemical degradation method of the original DNA (in 
both cases the DNA must be clonal). Both methods produce populations of fragments 
that begin from a particular point and terminate in every base that is found in the DNA 
10 fragment that is to be sequenced. The termination of each fragment is dependent on the 
location of a particular base within the original DNA fragment. The DNA fragments 
are separated by polyacrylamide gel electrophoresis and the order of the DNA bases 
(adenine, cytosine, thymine, guanine; also known as A,C,T,G, respectively) is read from 

a autoradiograph of the gel. 
15 a cumbersome DNA pooling sequencing strategy (Church and Kieffer- 

Higgins, Science 2*185, 1988) is one of the more recent approaches to DNA 
sequencing. A pooling sequencing strategy consists of pooling a number of DNA 
templates (samples) and processing the samples as pools. In order to separate the 
sequence information at the end of the processing, the DNA molecules of interest are 
20 ligated to a set of oligonucleotide 'tags" at the beginning. The tagged DNA molecules 
are pooled, amplified and chemically fragmented in 96-well plates. After 
electrophoresis of the pooled samples, the DNA is transferred to a solid support and 
then hybridized with a sequential series of specific labeled oligonucleotides. These 
membranes are then probed as many times as there are tags in the original pool, 
25 producing, in each set of probing, autoradiographs similar to those from standard DNA 
sequencing methods. Thus each reaction and gel yields a quantity of data equivalent to 
that obtained from conventional reactions and gels multiplied by the number of probes 
used. If alkaline phosphatase is used as the reporter enzyme, 1 ,2-dioxetane substrate 
can be used which is detected in a chemiluminescent assay format. However, this 
30 pooling strategy's major disadvantage is that the sequences can only be read by 



3NSOOC1D: <WO 9727331 A2J_> 



WO 97/27331 PCTAJS97/01304 



Southern blotting the sequencing gel and hybridizing this membrane once for each 
clone in the pool. 

In addition to advances in sequencing methodologies, advances in speed 
have occurred due to the advent of automated DNA sequencing. Briefly, these methods 
5 use fluorescent-labeled primers which replace methods which employed radiolabeled 
components. Fluorescent dyes are attached either to the sequencing primers or the 
ddNTP-terminators. Robotic components now utilize polymerase chain reaction (PCR) 
technology which has lead to the development of linear amplification strategies. 
Current commercial sequencing allows all 4 dideoxy-terminator reactions to be run on a 
10 single lane. Each dideoxy-terminator reaction is represented by a unique fluorescent 
primer (one fluorophore for each base type. A,T,C,G). Only one template DNA (i.e., 
DNA sample) is represented per lane. Current gels permit the simultaneous 
electrophoresis of up to 64 samples in 64 different lanes. Different ddNTP-terminated 
fragments are detected by the irradiation of the gel lane by light followed by detection 
15 of emitted light from the fluorophore. Each electrophoresis step is about 4-6 hours 
long. Each electrophoresis separation resolves about 400-600 nucleotides (nt), 
therefore, about 6000 nt can be sequenced per hour per sequencer. 

The use of mass spectrometry for the study of monomeric constituents of 
nucleic acids has also been described (Hignite, In Biochemical Applications of Mass 
20 Spectrometry. Waller and Dermer (eds.), Wiley-Interscience, Chapter 16, p. 527, 1 972). 
Briefly, for larger oligomers, significant early success was obtained by plasma 
desorption for protected synthetic oligonucleotides up to 14 bases long, and for 
unprotected oligos up to 4 bases in length. As with proteins, the applicability of ESI- 
MS to oligonucleotides has been demonstrated (Covey et al., Rapid Comm. in Mass 
25 Spec. 2:249-256, 1988). These species are ionized in solution, with the charge residing 
at the acidic bridging phosphodiester and/ or terminal phosphate moieties, and yield in 
the gas phase multiple charged molecular anions, in addition to sodium adducts. 

Sequencing DNA with <100 bases by the common enzymatic ddNTP 
technique is more complicated than it is for larger DNA templates, so that chemical 
30 degradation is sometimes employed. However, the chemical decomposition method 
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requires about 50 pmol of radioactive 32 P end-labeled material, 6 chemical steps, 
electrophoretic separation, and film exposure. For small oligonucleotides (<14 nts) the 
combination of electrospray ionization (ESI) and Fourier transform (FT) mass 
spectrometry (MS) is far faster and more sensitive. Dissociation products of multiply- 
5 charged ions measured at high (10 5 ) resolving power represent consecutive backbone 
cleavages providing the fall sequence in less than one minute on sub-picomole quantity 
of sample (Little etal., J. Am. Chem. Soc. 776:4893, 1994). For molecular weight 
measurements, ESI/MS has been extended to larger fragments (Potier et al., Nuc. Acids 
Res. 22:3895, 1994). ESI/FTMS appears to be a valuable complement to classical 

10 methods for sequencing and pinpoint mutations in nucleotides as large as 100-mers. 
Spectral data have recently been obtained loading 3 x 10° 3 mol of a 50-mer using a 
more sensitive ESI source (Valaskovic, Anal. Chem. 68:259, 1995). 

The other approach to DNA sequencing by mass spectrometry is one in 
which DNA is labeled with individual isotopes of an element and the mass spectral 

15 analysis simply has to distinguish the isotopes after a mixtures of sizes of DNA have 
been separated by electrophoresis. (The other approach described above utilizes the 
resolving power of the mass spectrometer to both separate and detect the DNA 
oligonucleotides of different lengths, a difficult proposition at best.) All of the 
procedures described below employ the Sanger procedure to convert a sequencing 

20 primer to a series of DNA fragments that vary in length by one nucleotide. The 
enzymatically synthesized DNA molecules each contain the original primer, a replicated 
sequence of part of the DNA of interest, and the dideoxy terminator. That is, a set of 
DNA molecules is produced that contain the primer and differ in length by from each 
other by one nucleotide residue. 

25 Brennen et al. {Biol. Mass Spec, New York, Elsevier, p. 219, 1990) has 

described methods to use the four stable isotopes of sulfur as DNA labels that enable 
one to detect DNA fragments that have been separated by capillary electrophoresis. 
Using the a-thio analogues of the ddNTPs, a single sulfur isotope is incorporated into 
each of the DNA fragments. Therefore each of the four types of DNA fragments 

30 (ddTTP, ddATP, ddGTP, ddCTP-terminated) can be uniquely labeled according to the 
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terminal nucleotide; for example, 32 S for fragments ending in A, 33 S for G, "S for C, and 
36 S for T, and mixed together for electrophoresis column, fractions of a few picoliters 
are obtained by a modified ink-jet printer head, and then subjected to complete 
combustion in a furnace. This process oxidizes the thiophosphates of the labeled DNA 
5 to S0 2 , which is subjected to analysis in a quadmpole or magnetic sector mass 
spectrometer. The SO z mass unit representation is 64 for fragments ending in A, 65 for 
G, 66 for C, and 68 for T. Maintenance of the resolution of the DNA fragments as they 
emerge from the column depends on taking sufficiently small fractions. Because the 
mass spectrometer is coupled directly to the capillary gel column, the rate of analysis is 
10 determined by the rate of electrophoresis. This process is unfortunately expensive, 
liberates radioactive gas and has not been commercialized. Two other basic constraints 
also operate on this approach: (a) No other components with mass of 64, 65, 66, or 68 
(isobaric contaminants) can be tolerated and (b) the % natural abundances of the sulfur 
isotopes C 2 S is 95.0, »S is 0.75, *S is 4.2, and »S is 0.1 1) govern the sensitivity and 
1 5 cost. Since M S is 95% naturally abundant, the other isotopes must be enriched to >99% «. 
to eliminate contaminating 32 S. Isotopes that are <1% abundant are quite expensive to 
obtain at 99% enrichment; even when *S is purified 100-fold it contains as much or 
more U S as it does 36 S. 

Gilbert has described an automated DNA sequencer (EPA, 921 08678.2) 
20 that consists of an oligomer synthesizer, an array on a membrane, a detector which 
detects hybridization and a central computer, The synthesizer synthesizes and labels 
multiple oligomers of arbitrary predicted sequence. The oligomers are used to probe 
immobilized DNA on membranes. The detector identifies hybridization patterns and 
then sends those patterns to a central computer which constructs a sequence and then 
25 predicts the sequence of the next round of synthesis of oligomers. Through an iterative 
process, a DNA sequence can be obtained in an automated fashion. 

Brennen has described a method for sequencing nucleic acids based on 
ligation of oligomers (U.S. Patent No. 5,403,708). Methods and compositions are 
described for forming ligation product hybridized to a nucleic acid template. A primer 
30 is hybridized to a DNA template and then a pool of random extension oligonucleotides 
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is also hybridized to the primed template in the presence ligase(s). The ligase enzyme 
covalently ligates the hybridized oligomers to the primer. Modifications permit the 
determination of the nucleotide sequence of one or more members of a first set of target 
nucleotide residues in the nucleic acid template that are spaced at intervals of N 
5 nucleotides. In this method, the labeled ligated product is formed wherein the position 
and type of label incorporated into the ligation product provides information concerning 
the nucleotide residue in the nucleic acid template with which the labeled nucleotide 

residue is base paired. 

Koster has described an method for sequencing DNA by mass 
10 spectrometry after degradation of DNA by an exonuclease (PCT/US94/02938). The 
method described is simple in that DNA sequence is directly determined (the Sanger 
reaction is not used). DNA is cloned into standard vectors, the 5' end is immobilized 
and the strands are then sequentially degraded at the 3' end via an exonuclease and the 
enzymatic product (nucleotides) are detected by mass spectrometry. 
15 Weiss et al. have described an automated hybridization/imaging device 

for fluorescent multiplex DNA sequencing (PCT/US94/11918). The method is based 
on the concept of hybridizing enzyme-linked probes to a membrane containing size 
separated DNA fragments arising from a typical Sanger reaction. 

The demand for sequencing information is larger than can be supplied by 
20 the currently existing sequencing machines, such as the ABI377 and the Pharmacia 
ALF. One of the principal limitations of the current technology is the small number of 
tags which can be resolved using the current tagging system. The Church pooling 
system discussed above uses more tags, but the use and detection of these tags is 
laborious. 

25 The present invention discloses novel compositions and methods which 

may be utilized to sequence nucleic acid molecules with greatly increased speed and 
sensitivity than the methods described above, and further provides other related 
advantages. 
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SUMMARY OF THE INVENTION 

Briefly stated, the present invention provides methods, compounds, 
compositions, kits and systems for determining the sequence of nucleic acid molecules. 
Within one aspect of the invention, methods are provided for determining the sequence 
of a nucleic acid molecule. The methods includes the steps: (a) generating tagged 
nucleic acid fragments which are complementary to a selected target nucleic acid 
molecule, wherein a tag is correlative with a particular nucleotide and detectable by 
non-fluorescent spectrometry or potentiometry; (b) separating the tagged fragments by 
sequential length; (c) cleaving the tags from the tagged fragments; and (d) detecting the 
tags by non-fluorescent spectrometry or potentiometry, and therefrom determining the 
sequence of the nucleic acid molecule. In preferred embodiments, the tags are detected 
by mass spectrometry, infrared spectrometry, ultraviolet spectrometry or potentiostatic 
amperometry. 

In another aspect, the invention provides a compound of the formula: 
15 T^-L-X 

wherein T» is an organic group detectable by mass spectrometry, comprising carbon, at 
least one of hydrogen and fluoride, and optional atoms selected from oxygen- nitrogen, 
sulfur, phosphorus and iodine; L is an organic group which allows a ^-containing 
moiety to be cleaved from the remainder of the compound, wherein the ^-containing 
20 moiety comprises a functional group which supports a single ionized charge state when 
the compound is subjected to mass spectrometry and is selected from tertiary amine, 
quaternary amine and organic acid; X is a functional group selected from hvdroxyl, 
amino, thiol, carboxylic acid, haloalkyl, and derivatives thereof which either activate or 
inhibit the activity of the group toward coupling with other moieties, or is a nucleic acid 
25 fragment attached to L at other than the 3' end of the nucleic acid fragment; with the 
provisos that the compound is not bonded to a solid support through X nor has a mass 
of less than 250 daltons. 

In another aspect, the invention provides a composition comprising a 
plurality of compounds of the formula T»-L-MOI, wherein, T™ is an organic group 
30 detectable by mass spectrometry, comprising carbon, at least one of hydrogen and 
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fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and 
iodine; L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T" s -containing moiety comprises 
a functional group which supports a single ionized charge state when the compound 
5 is subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; MOI is a nucleic acid fragment wherein L is conjugated to 
the MOI at a location other than the 3' end of the MOI; and wherein no two 
compounds have either the same T" s or the same MOI. 

In another aspect, the invention provides a composition comprising 
1 0 water and a compound of the formula T'-L-MOl, wherein, 1™ is an organic group 
detectable by mass spectrometry, comprising carbon, at least one of hydrogen and 
fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and 
iodine; L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T s -containing moiety comprises 
1 5 a functional group which supports a single ionized charge state when the compound 
is subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; and MOI is a nucleic acid fragment wherein L is conjugated 
to the MOI at a location other than the 3' end of the MOI. 

In another aspect, the invention provides for a composition 
20 comprising a plurality of sets of compounds, each set of compounds having the 
formula T ms -L-MOI, wherein, T"» is an organic group detectable by mass 
spectrometry, comprising carbon, at least one of hydrogen and fluoride, and optional 
atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine: L is an 
organic group which allows a ^-containing moiety to be cleaved from the 
25 remainder of the compound, wherein the T"-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; MOI is a nucleic acid fragment wherein L is conjugated to 
the MOI at a location other than the 3' end of the MOI; wherein within a set, all 
30 members have the same T™ group, and the MOI fragments have variable lengths 
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that terminate with the same dideoxynucleotide selected from ddAMP, ddGMP, 
ddCMP and ddTMP; and wherein between sets, the T» groups differ by at least 2 



amu. 



10 



15 



20 



25 



30 



In another aspect, the invention provides for a composition 
comprising a first plurality of sets of compounds as described in the preceding 
paragraph, in combination with a second plurality of sets of compounds having the 
formula T»-L-MOI, wherein, T» is an organic group detectable by mass 
spectrometry, comprising carbon, at least one of hydrogen and fluoride, and optional 
atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an 
organic group which allows a T~-containing moiety to be cleaved from the 
remainder of the compound, wherein the T™-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; MOI is a nucleic acid fragment wherein L is conjugated to 
the MOI at a location other than the 3' end of the MOI; and wherein all members < 
within the second plurality have an MOI sequence which terminates with the same 
dideoxynucleotide selected from ddAMP, ddGMP, ddCMP and ddTMP; with the 
proviso that the dideoxynucleotide present in the compounds of the first plurality is 
not the same dideoxynucleotide present in the compounds of the second plurality. 

In another aspect, the invention provides for a kit for DNA 
sequencing analysis. The kit comprises a plurality of container sets, each container 
set comprising at least five containers, wherein a first container contains a vector, a 
second, third, fourth and fifth containers contain compounds of the formula 
T--L-MOI wherein, T- is an organic group detectable by mass spectrometry, 
comprising carbon, at least one of hydrogen and fluoride, and optional atoms 
selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an organic 
group which allows a ^-containing moiety to be cleaved from the remainder of the 
compound, wherein the T"-containing moiety comprises a functional group which 
supports a single ionized charge state when the compound is subjected to mass 
spectrometry and is selected from tertiary amine, quaternary amine and organic 
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acid; and MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3* end of the MOI; such that the MOI for the second, third, 
fourth and fifth containers is identical and complementary to a portion of the vector 
within the set of containers, and the T" 15 group within each container is different 
5 from the other 1™ groups in the kit. 

In another aspect, the invention provides for a system for determining the 
sequence of a nucleic acid molecule. The system comprises a separation apparatus that 
separates tagged nucleic acid fragments, an apparatus that cleaves from a tagged nucleic 
acid fragment a tag which is correlative with a particular nucleotide and detectable by 

1 0 electrochemical detection, and an apparatus for potentiostatic amperometry . 

Within other embodiments of the invention, 4, 5, 10, 15, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 200, 250, 300, 350, 400, 450 or greater than 500 different 
and unique tagged molecules may be utilized within a given reaction simultaneously, 
wherein each tag is unique for a selected nucleic acid fragment, probe, or first or second 

15 member, and may be separately identified. 

These and other aspects of the present invention will become evident 
upon reference to the following detailed description and attached drawings. In addition, 
various references are set forth below which describe in more detail certain procedures 
or compositions (e.g., plasmids, etc.), and are therefore incorporated by reference in 

20 their entirety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts the flowchart for the synthesis of pentafluorophenyl 
esters of chemically cleavable mass spectroscopy tags, to liberate tags with carboxyl 
25 amide termini. 

Figure 2 depicts the flowchart for the synthesis of pentafluorophenyl 
esters of chemically cleavable mass spectroscopy tags, to liberate tags with carboxyl 
acid termini. 

Figures 3-6 and 8 depict the flowchart for the synthesis of 
30 tetrafluorophenyl esters of a set of 36 photochemically cleavable mass spec. tags. 

Figure 7 depicts the flowchart for the synthesis of a set of 36 amine- 
terminated photochemically cleavable mass spectroscopy tags. 
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25 



Figure 9 depicts the synthesis of 36 photochemical ly cleavable mass 
spectroscopy tagged oligonucleotides made from the corresponding set of 36 
tetrafluorophenyl esters of photochemically cleavable mass spectroscopy tag acids. 

Figure 10 depicts the synthesis of 36 photochemically cleavable mass 
spectroscopy tagged oligonucleotides made from the corresponding set of 36 amine- 
terminated photochemically cleavable mass spectroscopy tags. 

Figure 1 1 illustrates the simultaneous detection of multiple tags by mass 

spectrometry. 

Figure 12 shows the mass spectrogram of the alpha-cyano matrix alone. 
Figure 13 depicts a modularly-constructed tagged nucleic acid fragment. 



DETAILED DESCRIPTION OF THE INVENTION 

Briefly stated, in one aspect the present invention provides compounds 
wherein a molecule of interest, or precursor thereto, is linked via a labile bond (or labile 
15 bonds) to a tag. Thus, compounds of the invention may be viewed as having the general 



formula: 

T-L-X 



wherein T is the tag component, L is the linker component that either is. or contains, a 
20 labile bond, and X is either the molecule of interest (MOI) component or a functional 
group component (L h ) through which the MOI may be joined to T-L. Compounds of 
the invention may therefore be represented by the more specific general formulas: 



T-L-MOI and T-L-L h 



For reasons described in detail below, sets of T-L-MOI compounds may 
be purposely subjected to conditions that cause the labile bond(s) to break, thus 
releasing a tag moiety from the remainder of the compound. The tag moiety is then 
characterized by one or more analytical techniques, to thereby provide direct 
30 information about the structure of the tag moiety, and (most importantly) indirect 
information about the identity of the corresponding MOI. 



•-DOCID: <WO 97Z7331A2_I_> 



WO 97/27331 



PCTAJS97/01304 



12 

As a simple illustrative example of a representative compound of the 
invention wherein L is a direct bond, reference is made to the following structure (i): 




id Fragment) 



Tag component Molecule of Interest 

component 

5 

In structure (i), T is a nitrogen-containing polycyclic aromatic moiety bonded to a 
carbonyl group, X is a MOI (and specifically a nucleic acid fragment terminating in an 
amine group), and L is the bond which forms an amide group. The amide bond is labile 
relative to the bonds in T because, as recognized in the art, an amide bond may be 
10 chemically cleaved (broken) by acid or base conditions which leave the bonds within 
the tag component unchanged. Thus, a tag moiety (i.e., the cleavage product that 
contains T) may be released as shown below: 
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(Nucleic Acid Fragment) 



acid or base 



OH H Nr " (Nl,c,eic Acid Fragment) 



Tag Moiety 



Remainder of the Compound 



However, the linker L may be more than merely a direct bond, as shown 
in the following illustrative example, where reference is made to another representative 
5 compound of the invention having the structure (ii) shown below: 



10 



Structure (ii) 






NO, 




6 



L 



1 



^(Nucleic Acid 
I Fragment) 
H 



It is well-known that compounds having an or/Ao-mtrobenzylamine moiety (see boxed 
atoms within structure (ii)) are photolytically unstable, in that exposure of such 
compounds to actinic radiation of a specified wavelength will cause selective cleavage 
of the benzylamine bond (see bond denoted with heavy line in structure (ii)). Thus 
structure (ii) has the same T and MOI groups as structure (i), however the linker group 
contains multiple atoms and bonds within which there is a particularly labile bond 
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Photolysis of structure (ii) thus releases a tag moiety (T-containing moiety) from the 
remainder of the compound, as shown below. 




Tag Moiety Remainder of the Compound 

5 

The invention thus provides compounds which, upon exposure to 
appropriate cleavage conditions, undergo a cleavage reaction so as to release a tag 
moiety from the remainder of the compound. Compounds of the invention may be 
described in terms of the tag moiety, the MOI (or precursor thereto, L h ), and the labile 

10 bond(s) which join the two groups together. Alternatively, the compounds of the 
invention may be described in terms of the components from which they are formed. 
Thus, the compounds may be described as the reaction product of a tag reactant, a linker 
reactant and a MOI reactant, as follows. 

The tag reactant consists of a chemical handle (T h ) and a variable 

1 5 component (T vc ), so that the tag reactant is seen to have the general structure: 

To illustrate this nomenclature, reference may be made to structure (iii), which shows a 
20 tag reactant that may be used to prepare the compound of structure (ii). The tag reactant 
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having structure (iii) contains a tag variable component and a tag handle, as shown 
below: 



Structure (iii) 




Tag Variable Tag 
Component Handle 

5 

In structure (iii), the tag handle (-C(=0)-A) simply provides an avenue 
for reacting the tag reactant with the linker reactant to form a T-L moiety. The group 
"A" in structure (iii) indicates that the carboxyl group is in a chemically active state, so 
it is ready for coupling with other handles. "A" may be, for example, a hydroxyl group 
10 or pentafluorophenoxy, among many other possibilities. The invention provides for a 
large number of possible tag handles which may be bonded to a tag variable component, 
as discussed in detail below. The tag variable component is thus a part of "T" in the 
fonnula T-L-X, and will also be part of the tag moiety that forms from the reaction that 
cleaves L. 

15 As 3180 di sc«ssed in detail below, the tag variable component is so- 

named because, in preparing sets of compounds according to the invention, it is desired 
that members of a set have unique variable components, so that the individual members 
may be distinguished from one another by an analytical technique. As one example, the 
tag variable component of structure (iii) may be one member of the following set, where 

20 members of the set may be distinguished by their UV or mass spectra: 
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Likewise, the linker reactant may be described in terms of its chemical 
handles (there are necessarily at least two, each of which may be designated as L h ) 
which flank a linker labile component, where the linker labile component consists of the 
required labile moiety (L 2 ) and optional labile moieties (L 1 and V), where the optional 
labile moieties effectively serve to separate L 2 from the handles L h , and the required 
labile moiety serves to provide a labile bond within the linker labile component. Thus, 
the linker reactant may be seen to have the general formula: 

L h -V-L 2 -L 3 -L h 

The nomenclature used to describe the linker reactant may be illustrated 
in view of structure (iv), which again draws from the compound of structure (ii): 



Structure (iv) 



NO, 



Linker 
Handle 




Linker 
Handle 



As structure (iv) illustrates, atoms may serve in more than one functional 
role. Thus, in structure (iv), the benzyl nitrogen functions as a chemical handle in 
20 allowing the linker reactant to join to the tag reactant via an amide-forming reaction, 
and subsequently also serves as a necessary part of the structure of the labile moiety L 2 
in that the benzylic carbon-nitrogen bond is particularly susceptible to photolytic 
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cleavage. Structure (iv) also illustrates that a linker reactant may have an L J group (in 
this case, a methylene group), although not have an L' group. Likewise, linker reactants 
may have an L' group but not an V group, or may have L' and L 3 groups, or may have 

neither of L' nor L 1 groups. In structure (iv), the presence of the group "P" next to the 
5 carbonyl group indicates that the carbonyl group is protected from reaction. Given this 

configuration, the activated carboxyl group of the tag reactant (iii) may cleanly react 

with the amine group of the linker reactant (iv) to form an amide bond and give a 

compound of the formula T-L-L h . 

The MOI reactant is a suitably reactive form of a molecule of interest. 
1 0 Where the molecule of interest is a nucleic acid fragment, a suitable MOI reactant is a 
nucleic acid fragment bonded through its 5' hydroxyl group to a phosphodiester group 
and then to an alkylene chain that terminates in an amino group. This amino group may 
then react with the carbonyl group of structure (iv), (after, of course, deprotecting the 
carbonyl group, and preferably after subsequently activating the carbonyl group toward 
1 5 reaction with the amine group) to thereby join the MOI to the linker. 

When viewed in a chronological order, the invention is seen to fake a tag 
reactant (having a chemical tag handle and a tag variable component), a linker reactant 
(having two chemical linker handles, a required labile moiety and 0-2 optional labile 
moieties) and a MOI reactant (having a molecule of interest component and a chemical 
20 molecule of interest handle) to form T-L-MOI. Thus, to form T-L-MOI, either the tag 
reactant and the linker reactant are first reacted together to provide T-L-L h , and then the 
MOI reactant is reacted with T-L-L h so as to provide T-L-MOI, or else (less preferably) 
the linker reactant and the MOI reactant are reacted together first to provide L h -L-MOI, 
and then L h -L-MOI is reacted with the tag reactant to provide T-L-MOI. For purposes 
25 of convenience, compounds having the formula T-L-MOI will be described in terms of 
the tag reactant, the linker reactant and the MOI reactant which may be used to form 
such compounds. Of course, the same compounds of formula T-L-MOI could be 
prepared by other (typically, more laborious) methods, and still fall within the scope of 
the inventive T-L-MOI compounds. 
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In any event, the invention provides that a T-L-MOI compound be 
subjected to cleavage conditions, such that a tag moiety is released from the remainder 
of the compound. The tag moiety will comprise at least the tag variable component, 
and will typically additionally comprise some or all of the atoms from the tag handle, 
5 some or all of the atoms from the linker handle that was used to join the tag reactant to 
the linker reactant, the optional labile moiety L 1 if this group was present in T-L-MOl, 
and will perhaps contain some part of the required labile moiety L 2 depending on the 
precise structure of V and the nature of the cleavage chemistry. For convenience, the 
tag moiety may be referred to as the T-containing moiety because T will typically 
1 0 constitute the major portion (in terms of mass) of the tag moiety. 

Given this introduction to one aspect of the present invention, the 
various components T, L and X will be described in detail. This description begins with 
the following definitions of certain terms, which will be used hereinafter in describing 
T, L and X. 

15 As used herein, the term "nucleic acid fragment" means a molecule 

which is complementary to a selected target nucleic acid molecule (i.e., complementary 
to all or a portion thereof), and may be derived from nature or synthetically or 
recombinantly produced, including non-naturally occurring molecules, and may be in 
double or single stranded form where appropriate; and includes an oligonucleotide (e.g.. 
20 DNA or RNA), a primer, a probe, a nucleic acid analog (e.g., PNA), an oligonucleotide 
which is extended in a 5' to 3' direction by a polymerase, a nucleic acid which is cleaved 
chemically or enzymatically, a nucleic acid that is terminated with a dideoxy terminator 
or capped at the 3' or 5' end with a compound that prevents polymerization at the 5' or 3' 
end, and combinations thereof. The complementarity of a nucleic acid fragment to a 
25 selected target nucleic acid molecule generally means the exhibition of at least about 
70% specific base pairing throughout the length of the fragment. Preferably the nucleic 
acid fragment exhibits at least about 80% specific base pairing; and most preferably at 
least about 90%. Assays for determining the percent mismatch (and thus the percent 
specific base pairing) are well known in the art and are based upon the percent 
30 mismatch as a function of the Tm when referenced to the fully base paired control. 
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As used herein, the term "alkyl," alone or in combination, refers to a 
saturated, straight-chain or branched-chain hydrocarbon radical containing from 1 to 1 0, 
preferably from 1 to 6 and more preferably from 1 to 4, carbon atoms. Examples of 
such radicals include, but are not limited to, methyl, ethyl, n-propyl, iso-propyj, n-butyl, 
iso-butyl, sec-butyl, tert-butyl, pentyl, iso-amyl, hexyl, decyl and the like. The term 
"alkylene" refers to a saturated, straight-chain or branched chain hydrocarbon diradical 
containing from 1 to 10, preferably from 1 to 6 and more preferably from 1 to 4, carbon 
atoms. Examples of such diradicals include, but are not limited to, methylene, ethylene 
(-CH,-CH r ), propylene, and the like. 

The term "alkenyl," alone or in combination, refers to a straight-chain or 
branched-chain hydrocarbon radical having at least one carbon-carbon double bond in a 
total of from 2 to 10, preferably from 2 to 6 and more preferably from 2 to 4, carbon 
atoms. Examples of such radicals include, but are not limited to, ethenyl, E- and 
Z-propenyl, isopropenyl, E- and Z-butenyl, E- and Z-isobutenyl, E- and Z-pentenyl. 
decenyl and the like. The term "alkenylene" refers to a straight-chain or branched-chain 
hydrocarbon diradical having at least one carbon-carbon double bond in a total of from 
2 to 10, preferably from 2 to 6 and more preferably from 2 to 4, carbon atoms. 
Examples of such diradicals include, but are not limited to, methylidene (=CH,), 
ethylidene (-CH=CH-), propylene (-CH 2 -CH=CH-) and the like. 

The term "alkynyl," alone or in combination, refers to a straight-chain or 
branched-chain hydrocarbon radical having at least one carbon-carbon triple bond in a 
total of from 2 to 10, preferably from 2 to 6 and more preferably from 2 to 4, carbon 
atoms. Examples of such radicals include, but are not limited to, ethynyl (acetylenyl), 
propynyl (propargyl), butynyl, hexynyl, decynyl and the like. The term "alkynylene", 
alone or in combination, refers to a straight-chain or branched-chain hydrocarbon 
diradical having at least one carbon-carbon triple bond in a total of from 2 to 10, 
preferably from 2 to 6 and more preferably from 2 to 4, carbon atoms. Examples of 
such radicals include, but are not limited, ethynylene (-OC-), propynylene (-CH,- 
CsC-) and the like. 
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The term "cycloalkyl," alone or in combination, refers to a saturated, 
cyclic arrangement of carbon atoms which number from 3 to 8 and preferably from 3 to 
6, carbon atoms. Examples of such cycloalkyl radicals include, but are not limited to, 
cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl and the like. The term 
5 "cycloalkylene" refers to a diradical form of a cycloalkyl. 

The term "cycloalkenyl," alone or in combination, refers to a cyclic 
carbocycle containing from 4 to 8, preferably 5 or 6, carbon atoms and one or more 
double bonds. Examples of such cycloalkenyl radicals include, but are not limited to, 
cyclopentenyl, cyclohexenyU cyclopentadienyl and the like. The term 
1 0 "cycloalkenylene" refers to a diradical form of a cycloalkenyl. 

The term "aryl" refers to a carbocyclic (consisting entirely of carbon and 
hydrogen) aromatic group selected from the group consisting of phenyl, naphthyl, 
indenyl, indanyl, azulenyl, fluorenyl, and anthracenyl; or a heterocyclic aromatic group 
selected from the group consisting of furyU thienyl, pyridyl, pyrrolyl, oxazolyly, 
15 thiazolyl, imidazolyl, pyrazolyl, 2-pyrazolinyl, pyrazolidinyl, isoxazolyK isothiazolyl, 1 , 
2, 3-oxadiazolyl, 1, 2, 3-triazolyl, 1, 3, 4-thiadiazolyl, pyridazinyl, pyrimidinyh 
pyrazinyl, 1, 3, 5-triazinyl, 1, 3, 5-trithianyl, indolizinyl, indolyl, isoindolyl, 3H-indolyl, 
indolinyl, benzo[b]furanyl, 2, 3-dihydrobenzofiiranyl, benzo[b]thiophenyl, 
lH-indazolyl, benzimidazolyl, benzthiazolyl, purinyl, 4H-quinolizinyL quinolinyl, 
20 isoquinolinyl, cinnolinyl, phthalazinyl, quinazolinyl, quinoxalinyl, 1 , 8-naphthyridinyl, 
pteridinyl, carbazolyl, acridinyl, phenazinyl, phenothiazinyl, and phenoxazinyl. 

"AryP 1 groups, as defined in this application may independently contain 
one to four substituents which are independently selected from the group consisting of 
hydrogen, halogen, hydroxy^ amino, nitro, trifluoromethyl, trifluoromethoxy, alkyl, 
25 alkenyl, alkynyl, cyano, carboxy, carboalkoxy, 1,2-dioxy ethylene, alkoxy, alkenoxy or 
alkynoxy, alkylamino, alkenylamino, alkynylamino, aliphatic or aromatic acyk 
alkoxy-carbonylamino, alkylsulfonylamino, morpholinocarbonylamino, 

thiomorpholinocarbonylamino, N-alkyl guanidino, aralkylaminosulfonyl; 
aralkoxyalkyl; N-aralkoxyurea; N-hydroxylurea; N-alkenylurea; N,N-(alkyK 
30 hydroxyurea; heterocyclyl; thioaryloxy-substituted aryl; N,N-(aryL alkyl)hydrazino; 
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Ar'-substituted sulfonylheterocyclyl; aralkyl-substituted heterocyclyl; cycloalkyl and 
cycloakenyl-substituted heterocyclyl; cycloalkyl-fused aryl; aryloxy-substituted alkyl; 
heterocyclylamino; aliphatic or aromatic acylaminocarbonyl; aJiphatic or aromatic 
acyl-substituted alkenyl; Ar'-substituted aminocarbonyloxy; Ar\ Ar'-disubstituted aryl; 
aliphatic or aromatic acyl-substituted acyl; cycloalkylcarbonylalkyl; 
cycloalkyl-substituted amino; aryloxycarbonylalkyl; phosphorodiamidyl acid or ester; 

"Ar"' is a carbocyclic or heterocyclic aryl group as defined above having 
one to three substituents selected from the group consisting of hydrogen, halogen, 
hydroxyl, amino, nitro, trifluoromethyl, trifluoromethoxy, alkyl, alkenyl, alkynyl, 
1,2-dioxymethylene, 1 ,2-dioxyethylene, alkoxy, alkenoxy, alkynoxy, alkylamino. 
alkenylamino or alkynylamino, alkylcarbonyloxy, aliphatic or aromatic acyl, 
alkylcarbonylamino, alkoxycarbonylamino, alkylsulfonylamino, N-alkyl or N,N-dialkyl 



urea. 



The term "alkoxy," alone or in combination, refers to an alkyl ether 
radical, wherein the term "alkyl" is as defined above. Examples of suitable alkyl ether 
radicals include, but are not limited to, methoxy, ethoxy, n-propoxy, iso-propoxy, 
n-butoxy, iso-butoxy ; sec-butoxy, tert-butoxy and the like. 

The term "alkenoxy," alone or in combination, refers to a radical of 
formula alkenyl-O-, wherein the term "alkenyl" is as defined above provided that the 
radical is not an enol ether. Examples of suitable alkenoxy radicals include, but are not 
limited to, allyloxy, E- and Z-3-methyl-2-propenoxy and the like. 

The term "alkynyloxy," alone or in combination, refers to a radical of 
formula alkynyl-O-, wherein the term "alkynyl" is as defined above provided that the 
radical is not an ynol ether. Examples of suitable alkynoxy radicals include, but are not 
limited to, propargyloxy, 2-butynyloxy and the like. 

The term "thioalkoxy" refers to a thioether radical of formula alkyl-S-, 
wherein alkyl is as defined above. 

The term "alkylamino," alone or in combination, refers to a mono- or 
di-alkyl-substituted amino radical (i.e., a radical of formula alkyl-NH- or (alkyI) 2 -N-), 
wherein the term "alkyl" is as defined above. Examples of suitable alkylamino radicals 
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include, but are not limited to, methylamino, ethylamino, propylamino. isopropylamino. 
t-butylamino, N,N-diethylamino and the like. 

The term "alkenylamino," alone or in combination, refers to a radical of 
formula alkenyl-NH- or (alkenyl),N-, wherein the term "alkenyl" is as defined above, 
5 provided that the radical is not an enamine. An example of such alkenylamino radicals 

is the allylamino radical. 

The term "alkynylamino," alone or in combination, refers to a radical of 
formula alkynyl-NH- or (alkynyl) 2 N-, wherein the term "alkynyl" is as defined above, 
provided that the radical is not an ynamine. An example of such alkynylamino radicals 

10 is the propargyl amino radical. 

The term "amide" refers to either -N(R')-C(=0)- or -C(=0)-N(R')- 
where R' is defined herein to include hydrogen as well as other groups. The term 
"substituted amide" refers to the situation where R' is not hydrogen, while the term 
"unsubstituted amide" refers to the situation where R 1 is hydrogen. 
15 The term "aryloxy alone or in combination, refers to a radical of 

formula aryl-O-, wherein aryl is as defined above. Examples of aryloxy radicals 
include, but are not limited to, phenoxy, naphthoxy, pyridyloxy and the like. 

The terai "arylamino," alone or in combination, refers to a radical of 
formula aryl-NH-, wherein aryl is as defined above. Examples of arylamino radicals 
20 include, but are not limited to, phenylamino (anilido), naphthylamino, 2-, 3- and 

4-pyridylamino and the like. 

The term "aryl-fused cycloalkyl," alone or in combination, refers to a 
cycloalkyl radical which shares two adjacent atoms with an aryl radical, wherein the 
terms "cycloalkyl" and "aryl" are as defined above. An example of an aryl-fused 
25 cycloalkyl radical is the benzofused cyclobutyl radical. 

The term "alkylcarbonylamino," alone or in combination, refers to a 
radical of formula alkyl-CONH, wherein the term "alkyl" is as defined above. 

The term "alkoxycaxbonylamino," alone or in combination, refers to a 
radical of formula alkyl-OCONH-, wherein -the term "alkyl" is as defined above. 
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The term "alkylsulfonylamino," alone or in combination, refers to a 
radical of formula alkyl-S0 2 NH-, wherein the term "alky!" is as defined above. 

The term "aiylsulfonylamino," alone or in combination, refers to a 
radical of formula aryl-S0 2 NH-, wherein the term "aryl" is as defined above. 
5 The term "N-alkylurea," alone or in combination, refers to a radical of 

formula alkyl-NH-CO-NH-, wherein the term "alkyl" is as defined above. 

The term "N-arylurea," alone or in combination, refers to a radical of 
formula aryl-NH-CO-NH-, wherein the term "aryl" is as defined above. 

The term "halogen" means fluorine, chlorine, bromine and iodine. 
1 0 The term "hydrocarbon radical" refers to an arrangement of carbon and 

hydrogen atoms which need only a single hydrogen atom to be an independent stable 
molecule. Thus, a hydrocarbon radical has one open valence site on a carbon atom, 
through which the hydrocarbon radical may be bonded to other atom(s). Alkyl, alkenyl, 
cycloalkyl, etc. are examples of hydrocarbon radicals. 
15 The term "hydrocarbon diradical" refers to an arrangement of carbon and 

hydrogen atoms which need two hydrogen atoms in order to be an independent stable 
molecule. Thus, a hydrocarbon radical has two open valence sites on one or two carbon 
atoms, through which the hydrocarbon radical may be bonded to other atom(s). 
Alkylene, alkenylene, alkynylene, cycloalkylene, etc. are examples of hydrocarbon 
20 diradicals. 

The term "hydrocarbyl" refers to any stable arrangement consisting 
entirely of carbon and hydrogen having a single valence site to which it is bonded to 
another moiety, and thus includes radicals known as alkyl, alkenyl, alkynyl, cycloalkyl. 
cycloalkenyl, aryl (without heteroatom incorporation into the aryl ring), arylalkyl, 
25 alkylaryl and the like. Hydrocarbon radical is another name for hydrocarbyl. 

The term "hydrocarbylene" refers to any stable arrangement consisting 
entirely of carbon and hydrogen having two valence sites to which it is bonded to other 
moieties, and thus includes alkylene, alkenylene, alkynylene. cycloalkylene, 
cycloalkenylene, arylene (without heteroatom incorporation into the arylene ring), 
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arylalkylene, alkylarylene and the like. Hydrocarbon diradical is another name for 
hydrocarbylene. 

The term "hydrocarbyl-O-hydrocarbylene" refers to a hydrocarbyl group 
bonded to an oxygen atom, where the oxygen atom is likewise bonded to a 
5 hydrocarbylene group at one of the two valence sites at which the hydrocarbylene group 
is bonded to other moieties. The terms "hydrocarbyl-S-hydrocarbylene", "hydrocarbyl- 
NH-hydrocarbylene" and "hydrocarbyl-amide-hydrocarbylene" have equivalent 
meanings, where oxygen has been replaced with sulfur, -NH- or an amide group, 
respectively. 

10 The term N-(hydrocarbyl)hydrocarbylene refers to a hydrocarbylene 

group wherein one of the two valence sites is bonded to a nitrogen atom, and that 
nitrogen atom is simultaneously bonded to a hydrogen and a hydrocarbyl group. The 
term N,N-di(hydrocarbyl)hydrocarbylene refers to a hydrocarbylene group wherein one 
of the two valence sites is bonded to a nitrogen atom, and that nitrogen atom is 

15 simultaneously bonded to two hydrocarbyl groups. 

The term "hydrocarbylacyl-hydrocarbylene" refers to a hydrocarbyl 
group bonded through an acyl (-C(=0)-) group to one of the two valence sites of a 
hydrocarbylene group. 

The terms "heterocyclylhydrocarbyl" and "heterocylyl" refer to a stable, 

20 cyclic arrangement of atoms which include carbon atoms and up to four atoms (referred 
to as heteroatoms) selected from oxygen, nitrogen, phosphorus and sulfur. The cyclic 
arrangement may be in the form of a monocyclic ring of 3-7 atoms, or a bicyclic ring of 
8-1 1 atoms. The rings may be saturated or unsaturated (including aromatic rings), and 
may optionally be benzofused. Nitrogen and sulfur atoms in the ring may be in any 

25 oxidized form, including the quaternized form of nitrogen. A heterocyclylhydrocarbyl 
may be attached at any endocyclic carbon or heteroatom which results in the creation of 
a stable structure. Preferred heterocyclylhydrocarbyls include 5-7 membered 
monocyclic heterocycles containing one or two nitrogen heteroatoms. 
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A substituted heterocyciylhydrocarbyl refers to a 
heterocyciylhydrocarbyl as defined above, wherein at least one ring atom thereof is 
bonded to an indicated substituent which extends off of the ring. 

In referring to hydrocarbyl and hydrocarbylene groups, the term 
5 "derivatives of any of the foregoing wherein one or more hydrogens is replaced with an 
equal number of fluorides" refers to molecules that contain carbon, hydrogen and 
fluoride atoms, but no other atoms. 

The term "activated ester" is an ester that contains a "leaving group" 
which is readily displaceable by a nucleophile, such as an amine, an alcohol or a thiol 
10 nucleophile. Such leaving groups are well known and include, without limitation, 
N-hydroxysuccinimide, N-hydroxybenzotriazole, halogen (halides), alkoxy including 
tetrafluorophenolates, thioalkoxy and the like. The term "protected ester" refers to an 
ester group that is masked or otherwise unreactive. See, e.g., Greene, "Protecting 
Groups In Organic Synthesis." 

In view of the above definitions, other chemical terms used throughout 
this application can be easily understood by those of skill in the art. Terms may be used 
alone or in any combination thereof. The preferred and more preferred chain lengths of 
the radicals apply to all such combinations. 

20 A GENERATION OF TAfiOFH Mi ICXFTP Arm PPAr.u^c 

As noted above, one aspect of the present invention provides a general 
scheme for DNA sequencing which allows the use of more than 16 tags in each lane; 
with continuous detection, the tags can be detected and the sequence read as the size 
separation is occurring, just as with conventional fluorescence-based sequencing. This 

25 scheme is applicable to any of the DNA sequencing techniques based on size separation 
of tagged molecules. Suitable tags and linkers for use within the present invention, as 
well as methods for sequencing nucleic acids, are discussed in more detail below. 



15 
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1. Tags 

"Tag", as used herein, generally refers to a chemical moiety which is 
used to uniquely identify a "molecule of interest", and more specifically refers to the tag 
variable component as well as whatever may be bonded most closely to it in any of the 
5 tag reactant, tag component and tag moiety. 

A tag which is useful in the present invention possesses several 

attributes: 

1) It is capable of being distinguished from all other tags. This 
discrimination from other chemical moieties can be based on the chromatographic 

10 behavior of the tag (particularly after the cleavage reaction), its spectroscopic or 
potentiometric properties, or some combination thereof Spectroscopic methods by 
which tags are usefully distinguished include mass spectroscopy (MS), infrared (IR), 
ultraviolet (UV), and fluorescence, where MS, IR and UV are preferred, and MS most 
preferred spectroscopic methods. Potentiometric amperometry is a preferred 

1 5 potentiometric method. 

2) The tag is capable of being detected when present at 10 22 to 10"* 

mole. 

3) The tag possesses a chemical handle through which it can be 
attached to the MOI which the tag is intended to uniquely identify. The attachment may 

20 be made directly to the MOI, or indirectly through a "linker" group. 

4) The tag is chemically stable toward all manipulations to which it 
is subjected, including attachment and cleavage from the MOI, and any manipulations 
of the MOI while the tag is attached to it. 

5) The tag does not significantly interfere with the manipulations 
25 performed on the MOI while the tag is attached to it. For instance, if the tag is attached 

to an oligonucleotide, the tag must not significantly interfere with any hybridization or 
enzymatic reactions (e.g.* PCR sequencing reactions) performed on the oligonucleotide. 
Similarly, if the tag is attached to an antibody, it must not significantly interfere with 
antigen recognition by the antibody. 
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A tag moiety which is intended to be detected by a certain spectroscopic 
or potentiometric method should possess properties which enhance the sensitivity and 
specificity of detection by that method. Typically, the tag moiety will have those 
properties because they have been designed into the tag variable component, which will 
5 typically constitute the major portion of the tag moiety. In the following discussion, the 
use of the word "tag" typically refers to the tag moiety (i.e., the cleavage product that 
contains the tag variable component), however can also be considered to refer to the tag 
variable component itself because that is the portion of the tag moiety which is typically 
responsible for providing the uniquely detectable properties. In compounds of the 
1 0 formula T-L-X, the <T" portion will contain the tag variable component. Where the tag 
variable component has been designed to be characterized by, e.g., mass spectrometry, 
the "T" portion of T-L-X may be referred to as T" s . Likewise, the cleavage product 
from T-L-X that contains T may be referred to as the T ms -containing moiety. The 
following spectroscopic and potentiometric methods may be used to characterize T ms - 
1 5 containing moieties. 



a. Characteristics of MS Tags 

Where a tag is analyzable by mass spectrometry (i.e., is a MS-rcadable 
tag, also referred to herein as a MS tag or "^-containing moiety"), the essential 
20 feature of the tag is that it is able to be ionized. It is thus a preferred element in the 
design of MS-readable tags to incorporate therein a chemical functionality which can 
carry a positive or negative charge under conditions of ionization in the MS. This 
feature confers improved efficiency of ion formation and greater overall sensitivity of 
detection, particularly in electrospray ionization. The chemical functionality that 
25 supports an ionized charge may derive from T™ or L or both. Factors that can increase 
the relative sensitivity of an analyte being detected by mass spectrometry are discussed 
in, e.g.. Sunner, J., et al.. Anal. Chem. 60:1300-1307 (1988). 

A preferred functionality to facilitate the carrying of a negative charge is 
an organic acid, such as phenolic hydroxyl, carboxylic acid, phosphonate, phosphate, 
30 tetrazole, sulfonyl urea, perfiuoro alcohol and sulfonic acid. 
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Preferred functionality to facilitate the carrying of a positive charge 
under ionization conditions are aliphatic or aromatic amines. Examples of amine 
functional groups which give enhanced detectability of MS tags include quaternary 
amines (i.e., amines that have four bonds, each to carbon atoms, see Aebersold, U.S. 
5 Patent No. 5,240,859) and tertiary amines (i.e., amines that have three bonds, each to 
carbon atoms, which includes C=N-C groups such as are present in pyridine, see Hess 
etal., Anal. Biochem. 224:373, 1995; Bures etal., Anal. Biochem. 224:364, 1995). 
Hindered tertiary amines are particularly preferred. Tertiary and quaternary amines may 
be alkyl or aryl. A ^-containing moiety must bear at least one ionizable species, but 
10 may possess more than one ionizable species. The preferred charge state is a single 
ionized species per tag. Accordingly, it is preferred that each ^-containing moiety 
(and each tag variable component) contain only a single hindered amine or organic acid 
group. 

Suitable amine-containing radicals that may form part of the T" s - 
1 5 containing moiety include the following: 

^ /~~\ |_ < ^y-O-(C 2 -C 10 )-N(C l -C 10 ) 2 

(C,-C 10 ) 

|— (C,-C 10 )-N^) J H^j 



N-CC.-C.o); |-(C,-C, 0 )-N^j 
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( C.- C>o) (C,— C I0 ) 



— CNH-(C 2 -C I0 )-N ) ; — CNH-(C,-C I0 

o N f I 




— CNH-(C 2 -C IO )-N(C 1 -C IO ) 2 ; — CNH-(C 2 -C 10 )-/^j . 



O 




— CN N(C,-C 10 ) ; and 
O — ' 

The identification of a tag by mass spectrometry is preferably based 
upon its molecular mass to charge ratio (m/z). The preferred molecular mass range of 
MS tags is from about 100 to 2,000 daltons, and preferably the T^-containing moiety 
5 has a mass of at least about 250 daltons, more preferably at least about 300 daltons, and 
still more preferably at least about 350 daltons. It is generally difficult for mass 
spectrometers to distinguish among moieties having parent ions below about 200-250 
daltons (depending on the precise instrument), and thus preferred T^-containing 
moieties of the invention have masses above that range. 

As explained above, the ^-containing moiety may contain atoms other 
than those present in the tag variable component, and indeed other than present in T" 
itself. Accordingly, the mass of T™ itself may be less than about 250 daltons, so long 
as the ^-containing moiety has a mass of at least about 250 daltons. Thus, the mass 
of T~ may range from 15 {i.e., a methyl radical) to about 10,000 daltons, and 
15 preferably ranges from 100 to about 5,000 daltons, and more preferably ranges from 
about 200 to about 1,000 daltons. 

It is relatively difficult to distinguish tags by mass spectrometry when 
those tags incorporate atoms that have more than one isotope in significant abundance. 



^iSDOCID: <WO 9727331 A2J_> 



WQ97/27331 



PCT/US97/01304 



30 

Accordingly, preferred T groups which are intended for mass spectroscopic 
identification (T ms groups), contain carbon, at least one of hydrogen and fluoride, and 
optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine. While 
other atoms may be present in the T m \ their presence can render analysis of the mass 
5 spectral data somewhat more difficult. Preferably, the 1™ groups have only carbon, 
nitrogen and oxygen atoms, in addition to hydrogen and/or fluoride. 

Fluoride is an optional yet preferred atom to have in a T ms group. In 
comparison to hydrogen, fluoride is, of course, much heavier. Thus, the presence of 
fluoride atoms rather than hydrogen atoms leads to T ms groups of higher mass, thereby 
10 allowing the f" 5 group to reach and exceed a mass of greater than 250 daltons, which is 
desirable as explained above. In addition, the replacement of hydrogen with fluoride 
confers greater volatility on the T^-containing moiety, and greater volatility of the 
analyte enhances sensitivity when mass spectrometry is being used as the detection 
method. 

15 The molecular formula of T" s falls within the scope of C N500 N 0 . 100 O 0 . 

iooSo-io p o-ioH a FpI 5 wherein the sum of a, (5 and 5 is sufficient to satisfy the otherwise 
unsatisfied valencies of the C, N, O, S and P atoms. The designation C^ooN^ooCV 
iooSo.ioPo-K>H a FpI 8 means that T ms contains at least one, and may contain any number 
from 1 to 500 carbon atoms, in addition to optionally containing as many as 100 

20 nitrogen atoms ("N 0 _" means that T"* need not contain any nitrogen atoms), and as 
many as 100 oxygen atoms, and as many as 10 sulfur atoms and as many as 10 
phosphorus atoms. The symbols a, p and 8 represent the number of hydrogen, fluoride 
and iodide atoms in T 1 " 5 , where any two of these numbers may be zero, and where the 
sum of these numbers equals the total of the otherwise unsatisfied valencies of the C, N, 

25 O, S and P atoms. Preferably, T ms has a molecular formula that falls within the scope of 
c i.so N o-i<AMoH a Fp where the sum of a and p equals the number of hydrogen and 
fluoride atoms, respectively, present in the moiety. 
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b. Characteristics of IR Tags 

There are two primary forms of IR detection of organic chemical groups: 
Raman scattering IR and absorption IR. Raman scattering IR spectra and absorption IR 
spectra are complementary spectroscopic methods. In general, Raman excitation 
depends on bond polarizability changes whereas IR absorption depends on bond dipole 
moment changes. Weak IR absorption lines become strong Raman lines and vice versa. 
Wavenumber is the characteristic unit for IR spectra. There are 3 spectral regions for IR 
tags which have separate applications: near IR at 12500 to 4000 cm 1 . mid IR at 4000 
to 600 cm ', far IR at 600 to 30 cm". For the uses described herein where a compound 
is to serve as a tag to identify an MOI, probe or primer, the mid spectral regions would 
be preferred. For example, the carbonyl stretch (1850 to 1750 cm"') would be measured 
for carboxylic acids, carboxylic esters and amides, and alkyl and aryl carbonates, 
carbamates and ketones. N-H bending (1750 to 160 cm ') would be used to identify 
amines, ammonium ions, and amides. At 1400 to 1250 cm 1 , R-OH bending is detected 
as well as the C-N stretch in amides. Aromatic substitution patterns are detected at 900 
to 690 cm ' (C-H bending, N-H bending for ArNH 2 ). Saturated C-H, olefins, aromatic 
rings, double and triple bonds, esters, acetals, ketals, ammonium salts, N-O compounds 
such as oximes, nitro, N-oxides, and nitrates, azo, hydrazones, quinones, carboxylic 
acids, amides, and lactams all possess vibrational infrared correlation data (see Pretsch 
et al., Spectral Data for Structure Determination of Organic Compounds, Springer- 
Verlag, New York, 1989). Preferred compounds would include an aromatic nitrile 
which exhibits a very strong nitrile stretching vibration at 2230 to 2210 cm '. Other 
useful types of compounds are aromatic alkynes which have a strong stretching 
vibration that gives rise to a sharp absorption band between 2140 and 2100 cm 1 . A 
third compound type is the aromatic azides which exhibit an intense absorption band in 
the 2160 to 2120 cm" region. Thiocyanates are representative of compounds that have 
a strong absorption at 2275 to 2263 cm '. 
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c. Characteristics of UV Tags 

A compilation of organic chromophore types and their respective UV- 
visible properties is given in Scott (Interpretation of the UV Spectra of Natural 
Products, Permagon Press, New York, 1962). A chromophore is an atom or group of 
5 atoms or electrons that are responsible for the particular light absorption. Empirical 
rules exist for the n to n* maxima in conjugated systems (see Pretsch et al., Spectral 
Data for Structure Determination of Organic Compounds, p. B65 and B70, Springer- 
Verlag, New York, 1989). Preferred compounds (with conjugated systems) would 
possess n to n* and n to rc* transitions. Such compounds are exemplified by Acid 

10 Violet 7, Acridine Orange, Acridine Yellow G, Brilliant Blue Congo Red, Crystal 
Violet, Malachite Green oxalate, Metanil Yellow, Methylene Blue, Methyl Orange, 
Methyl Violet B, Naphtol Green B, Oil Blue N, Oil Red O, 4-phenylazophenol, 
Safranie O, Solvent Green 3, and Sudan Orange G, all of which are commercially 
available (Aldrich, Milwaukee, WI). Other suitable compounds are listed in, e.g.. Jane, 

15 L,etal..J. Chrom. 323:191-225 (1985). 

d. Characteristic of a Fluorescent Tag 

Fluorescent probes are identified and quantitated most directly by their 
absorption and fluorescence emission wavelengths and intensities. Emission spectra 

20 (fluorescence and phosphorescence) are much more sensitive and permit more specific 
measurements than absorption spectra. Other photophysical characteristics such as 
excited-state lifetime and fluorescence anisotropy are less widely used. The most 
generally useful intensity parameters are the molar extinction coefficient (s) for 
absorption and the quantum yield (QY) for fluorescence. The value of e is specified at a 

25 single wavelength (usually the absorption maximum of the probe), whereas QY is a 
measure of the total photon emission over the entire fluorescence spectral profile. A 
narrow optical bandwidth (<20 nm) is usually used for fluorescence excitation (via 
absorption), whereas the fluorescence detection bandwidth is much more variable, 
ranging from full spectrum for maximal sensitivity to narrow band (-20 nm) for 

30 maximal resolution. Fluorescence intensity per probe molecule is proportional to the 
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product of e and QY. The range of these parameters among fiuorophores of current 
practical importance is approximately 10,000 to 100,000 crn'M"' for e and 0.1 to 1.0 for 
QY. Compounds that can serve as fluorescent tags are as follows: fluorescein, 
rhodamine, lambda blue 470, lambda green, lambda red 664, lambda red 665, acridine 
orange, and propidium iodide, which are commercially available from Lambda 
Fluorescence Co. (Pleasant Gap, PA). Fluorescent compounds such as nile red, Texas 
Red, lissamine™ BODIPY™ s are available from Molecular Probes (Eugene, OR). 
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e. Characteristics of Potentiometric Tags 

The principle of electrochemical detection (ECD) is based on oxidation 
or reduction of compounds which at certain applied voltages, electrons are either 
donated or accepted thus producing a current which can be measured. When certain 
compounds are subjected to a potential difference, the molecules undergo a molecular 
rearrangement at the working electrodes' surface with the loss (oxidation) or gain 
(reduction) of electrons, such compounds are said to be electronic and undergo 
electrochemical reactions. EC detectors apply a voltage at an electrode surface over 
which the HPLC eluent flows. Electroactive compounds eluting from the column either 
donate electrons (oxidize) or acquire electrons (reduce) generating a current peak in real 
time. Importantly the amount of current generated depends on both the concentration of 
the analyte and the voltage applied, with each compound having a specific voltage at 
which it begins to oxidize or reduce. The currently most popular electrochemical 
detector is the amperometric detector in which the potential is kept constant and the 
current produced from the electrochemical reaction is then measured. This type of 
spectrometry is currently called "potentiostatic amperometry". Commercial 
amperemeters are available from ESA, Inc., Chelmford, MA. 

When the efficiency of detection is 100%, the specialized detectors are 
termed "coulometric". Coulometric detectors are sensitive which have a number of 
practical advantages with regard to selectivity and sensitivity which make these types of 
detectors useful in an array. In coulometric detectors, for a given concentration of 
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analyte, the signal current is plotted as a function of the applied potential (voltage) to 
the working electrode. The resultant sigmoidal graph is called the current-voltage curve 
or hydrodynamic voltammagram (HDV). The HDV allows the best choice of applied 
potential to the working electrode that permits one to maximize the observed signal. A 
5 major advantage of ECD is its inherent sensitivity with current levels of detection in the 

subfemtomole range. 

Numerous chemicals and compounds are electrochemically active 
including many biochemicals, pharmaceuticals and pesticides. Chromatographically 
coeluting compounds can be effectively resolved even if their half-wave potentials (the 
1 0 potential at half signal maximum) differ by only 30-60 mV . 

Recently developed coulometric sensors provide selectivity, 
identification and resolution of co-eluting compounds when used as detectors in liquid 
chromatography based separations. Therefore, these arrayed detectors add another set of 
separations accomplished in the detector itself. Current instruments possess 16 channels 
15 which are in principle limited only by the rate at which data can be acquired. The 
number of compounds which can be resolved on the EC array is chromatographically 
limited (i.e., plate count limited). However, if two or more compounds that 
chromatographically co-elute have a difference in half wave potentials of 30-60 mV, 
the array is able to distinguish the compounds. The ability of a compound to be 
20 electrochemically active relies on the possession of an EC active group (i.e., -OH, -O, - 
N, -S). 

Compounds which have been successfully detected using coulometric 
detectors include 5-hydroxytryptamine, 3-methoxy-4-hydroxyphenyl-glycol, 
homogentisic acid, dopamine, metanephrine, 3-hydroxykynureninr, acetaminophen, 3- 
25 hydroxytryptophol, 5-hydroxyindoleacetic acid, octanesulfonic acid, phenol, o-cresol, 
pyrogallol, 2-nitrophenol, 4-nitrophenol, 2,4-dinitrophenol, 4,6-dinitrocresol, 3-methyl- 

2- nitrophenol, 2,4-dichlorophenol, 2,6-dichlorophenol, 2,4,5-trichlorophenol, 4-chloro- 

3- methylphenol, 5-methylphenol, 4-methyl-2-nitrophenol, 2-hydroxyanilinc, 4- 
hydroxyaniline, 1,2-phenylenediamine, benzocatechin, buturon, chlortholuron. diuron, 

30 isoproturon, linuron, methobromuron, metoxuron, monolinuron, monuron, methionine. 
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tryptophan, tyrosine, 4-aminobenzoic acid, 4-hydroxybenzoic acid, 4-hydroxycoumaric 
acid, 7-methoxycoumarin, apigenin baicalein, caffeic acid, catechin, centaurein, 
chlorogenic acid, daidzein, datiscetin, diosmetin, epicatechin gallate, epigallo catechin, 
epigallo catechin gallate, eugenol, eupatorin, ferulic acid, fisetin, galangin, gallic acid, 
5 gardenin, genistein, gentisic acid, hesperidin, irigenin, kaemferol, leucoyanidin, 
luteolin, mangostin, morin, myricetin, naringin, narirutin, pelargondin, peonidin, 
phloretin, pratensein, protocatechuic acid, rhamnetin, quercetin, sakuranetin, 
scutellarein, scopoletin, syringaldehyde, syringic acid, tangeritin, troxerutin. 
umbelliferone, vanillic acid, 1,3-dimethyl tetrahydroisoquinoline, 6-hydroxydopamine, 
10 r-salsolinol, N-methyl-r-salsolinol, tetrahydroisoquinoline, amitriptyline, apomorphine, 
capsaicin, chlordiazepoxide, chlorpromazine, daunorubicin, desipramine, doxepin, 
fluoxetine, flurazepam, imipramine, isoproterenol, methoxamine, morphine, morphine- 
3-glucuronide, nortriptyline, oxazepam, phenylephrine, trimipramine, ascorbic acid, N- 
acetyl serotonin, 3,4-dihydroxybenzylamine, 3,4-dihydroxymandelic acid (DOMA), 
15 3,4-dihydroxyphenylacetic acid (DOPAC), 3,4-dihydroxyphenylalanine (L-DOPA), 
3,4-dihydroxyphenylglycol (DHPG), 3-hydroxyanthranilic acid, 2-hydroxyphenylacetic 
acid (2HPAC), 4-hydroxybenzoic acid (4HBAC), 5-hydroxyindole-3-acetic acid 
(5HIAA), 3-hydroxykynurenine, 3-hydroxymandelic acid, 3-hydroxy-4- 
methoxyphenylethylamine, 4-hydroxyphenylacetic acid (4HPAC), 

20 4-hydroxyphenyllactic acid (4HPLA), 5-hydroxytryptophan (5HTP). 5- 
hydroxytryptophol (5HTOL), 5-hydroxytryptamine (5HT), 5-hydroxytryptamine 
sulfate, 3-methoxy-4-hydroxyphenylglycol (MHPG), 5-methoxytryptamine, 5- 
methoxytryptophan, 5-methoxytryptophol, 3-methoxytyramine (3MT). 3- 
methoxytyrosine (3-OM-DOPA), 5-methylcysteine, 3-methylguanine, bufotenin, 
25 dopamine dopamine-3-glucuronide, dopamine-3 -sulfate, dopamine-4-suIfate, 
epinephrine, epinine, folic acid, glutathione (reduced), guanine, guanosine, 
homogentisic acid (HGA), homovanillic acid (HVA), homovanillyl alcohol (HVOL), 
homoveratic acid, hva sulfate, hypoxanthine, indole, indole-3-acetic acid, indole-3- 
lactic acid, kynurenine, melatonin, metanephrinc, N-methyltryptamine, N- 
30 methyltyramine, N,N-dimethyltryptamine, N,N-dimethyltyramine, norepinephrine, 
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normetanephrine, octopamine, pyridoxal. pyridoxal phosphate, pyridoxamine, 
synephrine, tryptophol, tryptamine, tyramine, uric acid, vanillylmandelic acid (vma), 
xanthine and xanthosine. Other suitable compounds are set forth in, e.g., Jane, I., et ai. 
./. Chrom. 323:191-225 (1985) and Musch, G., et al., J. Chrom. 348:97-110 (1985). 
5 These compounds can be incorporated into compounds of formula T-L-X by methods 
known in the art. For example, compounds having a carboxylic acid group may be 
reacted with amine, hydroxyl, etc. to form amide, ester and other linkages between T 
and L. 

In addition to the above properties, and regardless of the intended 
10 detection method, it is preferred that the tag have a modular chemical structure. This 
aids in the construction of large numbers of structurally related tags using the 
techniques of combinatorial chemistry. For example, the T~ group desirably has 
several properties. It desirably contains a functional group which supports a single 
ionized charge state when the T ms -containing moiety is subjected to mass spectrometry 
15 (more simply referred to as a "mass spec sensitivity enhancer" group, or MSSE). Also, 
it desirably can serve as one member in a family of ^-containing moieties, where 
members of the family each have a different mass/charge ratio, however have 
approximately the same sensitivity in the mass spectrometer. Thus, the members of the 
family desirably have the same MSSE. In order to allow the creation of fammes of 
20 compounds, it has been found convenient to generate tag reactants via a modular 
synthesis scheme, so that the tag components themselves may be viewed as compnsmg 

modules. ms 
In a preferred modular approach to the structure of the T s group, T 

has the formula 
25 TMJ-T 5 -),,- 

wherein T 1 is an organic moiety formed from carbon and one or more of hydrogen, 
fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass range of 15 to 
500 daltons; T 3 is an organic moiety formed from carbon and one or more of hydrogen, 
30 fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass range of 50 to 
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1000 daltons; J is a direct bond or a functional group such as amide, ester, amine, 
sulfide, ether, thioester, disulfide, thioether, urea, thiourea, carbamate, thiocarbamate, 
Setoff base, reduced Schiff base, imine, oxime, hydrazone, phosphate, phosphonate, 
phosphoramide, phosphonamide, sulfonate, sulfonamide or carbon-carbon bond; and n 
5 is an integer ranging from 1 to 50, such that when n is greater than 1, each T 3 and J is 
independently selected. 

The modular structure T 2 -(J-T 3 )„- provides a convenient entry to families 
of T-L-X compounds, where each member of the family has a different T group. For 
instance, when T is T™ 5 , and each family member desirably has the same MSSE, one of 
10 the T 3 groups can provide that MSSE structure. In order to provide variability between 
members of a family in terms of the mass of T s , the T 2 group may be varied among 
family members. For instance, one family member may have T 2 = methyl, while 
another has T 2 = ethyl, and another has T 2 = propyl, etc. 

In order to provide "gross" or large jumps in mass, a T 3 group may be 
1 5 designed which adds significant (e.g., one or several hundreds) of mass units to T-L-X. 
Such a T 3 group may be referred to as a molecular weight range adjuster 
groupCWRA"). A WRA is quite useful if one is working with a single set of T 2 groups, 
which will have masses extending over a limited range. A single set of T 2 groups may 
be used to create T~ groups having a wide range of mass simply by incorporating one 
20 or more WRA T 3 groups into the 1™ Thus, using a simple example, if a set of T 2 
groups affords a mass range of 250-340 daltons for the T», the addition of a single 
WRA, having, as an exemplary number 100 dalton, as a T 3 group provides access to the 
mass range of 350-440 daltons while using the same set of T 2 groups. Similarly, the 
addition of two 100 dalton MWA groups (each as a T 3 group) provides access to the 
25 mass range of 450-540 daltons, where this incremental addition of WRA groups can be 
continued to provide access to a very large mass range for the T°" group. Preferred 
compounds of the formula T 2 -(J-T 3 -) n -L-X have the formula R V w C -(RwraX-Rmsse L-X 
where VWC is a "T 2 " group, and each of the WRA and MSSE groups are "T"" groups. 
This structure is illustrated in Figure 13, and represents one modular approach to the 
30 preparation of T" 15 . 
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In the formula T 2 -(J-T 3 -)„-, T 2 and T 3 are preferably selected from 
hydrocarbyl, hydrocarbyl-O-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, 
hydrocarbyl-NH-hydrocarbylene, hydrocarbyl-amide-hydrocarbylene. N- 

(hydrocarbyl)hydrocarbylene, N,N-di(hydrocarbyl)hydrocarbylene, hydrocarbylacyl- 

5 hydrocarbylene, heterocyclylhydrocarbyl wherein the heteroatom(s) are selected from 
oxygen, nitrogen, sulfur and phosphorus, substituted heterocyclylhydrocarbyl wherein 
the heteroatom(s) are selected from oxygen, nitrogen, sulfur and phosphorus and the 
substituents are selected from hydrocarbyl, hydrocarbyl-O-hydrocarbylene, 
hydrocarbyl-NH-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, N- 

10 (hydrocarbyl)hydrocarbylene, N,N-di(hydrocarbyl)hydrocarbylene and 

hydrocarbylacyl-hydrocarbylene. In addition, T 2 and/or T 3 may be a derivative of any 
of the previously listed potential T 2 / T 3 groups, such that one or more hydrogens are 

replaced fluorides. 

Also regarding the formula T J -(J-T 3 -)„-, a preferred T 3 has the 
15 formula -G(R 2 )-, wherein G is C,. 6 alkylene chain having a single R 2 substituent. 
Thus, if G is ethylene (-CH 2 -CH 2 -) either one of the two ethylene carbons may have 
a R 2 substituent, and R 2 is selected from alkyl, alkenyl, alkynyl, cycloalkyl, 
aryl-fosed cycloalkyl, cycloalkenyl, aryl, aralkyl, aryl-substituted alkenyl or 
alkynyl, cycloalkyl-substituted alkyl, cycloalkenyl-substituted cycloalkyl, biaryl, 
20 alkoxy, alkenoxy, alkynoxy, aralkoxy, aryl-substituted alkenoxy or alkynoxy, 
alkylamino, alkenylamino or alkynylamino, aryl-substituted alkylamino, 
aryl-substituted alkenylamino or alkynylamino, aryloxy, arylamino, 
N-alkylurea-substituted alkyl, N-arylurea-substituted alkyl, 

alkylcarbonylamino-substituted alkyl, aminocarbonyl-substituted alkyl, 
25 heterocyclyl, heterocyclyl-substituted alkyl, heterocyclyl-substituted amino, 
carboxyalkyl substituted aralkyl, oxocarbocyclyl-fused aryl and heterocyclylalkyl; 
cycloalkenyl, aryl-substituted alkyl and, aralkyl, hydroxy -substituted alkyl, alkoxy- 
substituted alkyl, aralkoxy-substituted alkyl, alkoxy-substituted alkyl, aralkoxy- 
substituted alkyl, amino-substituted alkyl, (aryl-substituted 

30 alkyloxycarbonylamino)-substituted alkyl. thiol-substituted alkyl, alkylsulfonyl- 
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substituted alkyl, (hydroxy-substituted alkylthio)-substituted alkyl, thioalkoxy- 
substituted alkyl, hydrocarbylacylamino-substituted alkyl, heterocyclylacylamino- 
substituted alkyl, hydrocarbyl-substituted-heterocyclylacylamino-substituted alkyl. 
alkylsulfonylamino-substituted alkyl, arylsulfonylamino-substituted alkyl, 
5 morpholino-alkyl, thiomorpholino-alkyl, morpholino carbonyl-substituted alkyl, 
thiomorpholinocarbonyl-substituted alkyl, [N-(alkyl, alkenyl or alkynyl)- or N,N- 
[dialkyl, dialkenyl, dialkynyl or (alkyl, alkenyl)-amino]carbonyl-substituted alkyl, 
heterocyclylaminocarbonyl, heterocylylalkyleneaminocarbonyl, 
heterocyclylaminocarbonyl-substituted alkyl, heterocylylalkyleneaminocarbonyl- 
10 substituted alkyl, N,N-[dialkylJalkyleneaminocarbonyl, N,N- 

[dialkyljalkyleneaminocarbonyl-substituted alkyl, alkyl-substituted 

heterocyclylcarbonyl, alkyl-substituted heterocyclylcarbonyl-alkyl, carboxyl- 
substituted alkyl, dialkylamino-substituted acylaminoalkyl and amino acid side 
chains selected from arginine, asparagine, glutamine, S-methyl cysteine, methionine 
15 and corresponding sulfoxide and sulfone derivatives thereof, glycine, leucine, 
isoleucine, allo-isoleucine, tert-Ieucine, norleucine, phenylalanine, tyrosine, 
tryptophan, proline, alanine, ornithine, histidine, glutamine, valine, threonine, • 
serine, aspartic acid, beta-cyanoalanine, and allothreonine; alynyl and 
heterocyclylcarbonyl, aminocarbonyl, amido, mono- or dialkylaminocarbonyl, 
mono- or diarylaminocarbonyl, alkylarylaminocarbonyl, diarylaminocarbonyl, 
mono- or diacylaminocarbonyl, aromatic or aliphatic acyl, alkyl optionally 
substituted by substituents selected from amino, carboxy, hydroxy, mercapto, mono- 
or dialkylamino, mono- or diarylamino, alkylarylamino, diarylamino, mono- or 
diacylamino, alkoxy, alkenoxy, aryloxy, thioalkoxy, thioalkenoxy, thioalkynoxy, 
25 thioaryloxy and heterocyclyl. 

A preferred compound of the formula T 2 -(J-T 3 -)„-L-X has the structure: 
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wherein G is (CH 2 )^ such that a hydrogen on one and only one of the CH 2 groups 
represented by a single "G" is replaced with-CCH^-Amide-T 4 ; T 2 and T 4 are organic 
moieties of the formula C l . 2$ N 0 . 9 O 0 . 9 H a F p such that the sum of a and p is sufficient to 
satisfy the otherwise unsatisfied valencies of the C, N, and O atoms; amide is 

O O 

II II 

— N-C — or — C-N — ; 

I, I. 

R R l 

R is hydrogen or C,., 0 alkyl; c is an integer ranging 

from 0 to 4; and n is an integer ranging from 1 to 50 such that when n is greater than 1 , 

G, c, Amide, R 1 and T 4 are independently selected. 

In a further preferred embodiment, a compound of the formula T 2 -(J-T*- 

) n -L-X has the structure: 

i 

Anode 
I 

O (CH 2 ) C R i 0 

1} O (CH 2 ) C 
Amide 

wherein T 5 is an organic moiety of the formula C^N^O^F^Fp such that the sum of a 
and p is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, and O 
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atoms; and T s includes a tertiary or quaternary amine or an organic acid; m is an integer 
ranging from 0-49, and T 2 , T\ R\ L and X have been previously defined. 

Another preferred compound having the formula T 2 -(J-T 3 -) n -L-X has the 
particular structure: 




wherein T 5 is an organic moiety of the formula such that the sum of a 

and p is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, and O 
1 0 atoms; and V includes a tertiary or quaternary amine or an organic acid; m is an integer 
ranging from 0-49, and T 2 , T\ c, R', "Amide", L and X have been previously defined. 

In the above structures that have a T 5 group, -Amide-T 5 is preferably 
one of the following, which are conveniently made by reacting organic acids with free 
amino groups extending from "G": 



-NHC 
II 
O 




N 

(C,-C 10 ) 



—NHC 




O-(C 2 -C 10 )-N(C,-C 10 ) 2 



— NHC-(C,-C I0 )- 



0 ; 



N. 



NHC-(C 0 -C l0 )-£j*) . 



15 



— NHC 
II 

O 




N-(C,— C l0 ); and 



— NHC-( Cl -C I0 )-N- 

o L 
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Where the above compounds have a T 5 group, and the "G" group has a 
free carboxyl group (or reactive equivalent thereof), then the following are preferred 
-Amide-T s group, which may conveniently be prepared by reacting the appropriate 
organic amine with a free carboxyl group extending from a "G" group: 



— CNH— (C,— C 10 
O 




— CNH— (C,-C 10 
O 




— CNH— (C,— C,o) 
O 



N v> 




(C,— C I0 ) 




— CNH— (C 2 — C 10 )-N^ 



— CNH— (C 2 — C,o)— NCCrC.o^ ; 
O 



— CN 
II 

O 



N(C,— C, 0 ) ; and 



/ \ 

— CNH— (C 2 — C 10 )-N O ; 
O 

Ci c io) 
— CNH— (C,— C 10 ) f > ; 



-CNH — (C 2 — C| 0 ) — N 



V 



c^ NH r r 
I I* 




structure: 



In three preferred embodiments of the invention, T-L-MOI has the 



10 




(C ,— C l0 )— ODN — 3— OH 



or the structure: 
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T 4 
I 

Amide 



O (CH 2 ) C H 
II 1 ' 

H O NQi 




H 

/ 

(C | — C 1 0 ) — ODN — 3— -OH 



or the structure: 




H 
/ 
N 

V (Ci— C l0 )— ODN— 3— OH 

wherein V and T 4 are organic moieties of the formula C I . 25 N 0 . 9 O M S 0 . 3 P 0 _ 3 H a F p I, such 

that the sum of a, p and 5 is sufficient to satisfy the otherwise unsatisfied valencies of 

the C, N, O, S and P atoms; G is (CH 2 ) W wherein one and only one hydrogen on the 

CH 2 groups represented by each G is replaced with -(CHjX-Amide-T 4 ; Amide is 

O O 
II II 
— N-C or — C-N — ; 

h h 
R R 
10 " R ' ^ hydrogen or C M0 alkyl; c is an integer ranging 

from 0 to 4; "C r C 10 " represents a hydrocarbylene group having from 2 to 10 carbon 

atoms, "ODNO'-OH" represents a nucleic acid fragment having a terminal 3' hydroxyl 

group (i.e., a nucleic acid fragment joined to (C,-C I0 ) at other than the 3' end of the 

nucleic acid fragment); and n is an integer ranging from 1 to 50 such that when n is 

1 5 greater than 1 , then G, c, Amide, R» and T* are independently selected. Preferably there 

are not three heteroatoms bonded to a single carbon atom. 
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wherein T 1 and T 4 are organic moieties of the formula WWM -* *- *» 
sum of a and 0 is sufficient » satisfy rhe otherwise unsaved va.encies of d- C. N, 
and 0 atoms; 0 is ( CH,)„ herein one and on.y one hydrogen on the CH groups 
mpreserded by each G is rep.aceu *"* 15 

O ° 
_ N -C-or -C-N-; 

^1 R' r> is hydrogen or C,.,„ alkyl, c is an integer rangtng 

5 from 0 ,o 4 -ODW-OH" represents a nucleic acid fragment having a terminal V 

. • „„ in ,„„ ^ne from I to 50 such that when n is greater 
hydroxyl group; and n is an integer ranging no 

than 1. G, C Amide, R' and T* are independently selected. 

,„ stiucrums as set form above that contain a M(^»> group, 
l0 mis group may be formed b, reacting an amine of the fonnula HN(R> with an orgartic 
1 selected from tire foUowmg, which are exempt only and do no, constitute an 
e xhaustive list of potential organic acids; Formic acid. Acetic acid, Pmptohc acid. 
Propionic acid, Fluoroacetic acid, 2-Butynoic acid, C,c.opropan~a*o*y..c «* 
B JU acid, Methoxyacetic acid, Difhiomacetic acid, 4-Pentynmc «* 
I5 Cyc.obutimecarboxy.ic acid, W-Dimemy.acry.ic acid, Vaiertc acid N,N- 
oLthylglycme, H-Fonuy.-0.y-OH, Edioxyacetic acid, ^-J**^ 
Pyrro^-carbnxylic acid, 3-Furoic acid, — e- 5 -carboxy.,c acid, 
al, Tri„uoroacetic acid, Hexanoic acid, Ac-0,y-0„, ™^^*T 
acid, Bonnie acid. Nicotinic acid, 2-F^ecarboxyHc acid, 
2 0 pyrro,ec*rbox,nc acid, 2-Cyc.opentene-, -acetic acid, Cyclopentylacetic acid^SM-^- 
Pyrro.idone-5-carboxylic acid, N-Methy.-L-proline, Heptanotc acid, Ac-b-Ala-OH, 2 
Zw-hydroxybutyric acid, 2 -< 2 .Meuioxy«hoxy>acetic acid, p-To,u,c acid 6- 
ZJLJ acid, 5-Memy.^neearboxyHc acid, " 
carboxylic acid. 4-Fluorobenzoic acid, 3, 5 .Dimemy.isoxa Z o.e^carboxy..c add, 3- 
25 Cydopentylpcopionic acid, Octanoic acid. N.N-Dimemy— c ac t , 
Phenylpropiolic acid, Cinnamic acid, 4-Emylben.oic acid. p-Antstc acm. 10> 
Trimethylpyrrole-3^boxy.ic acid. 3-Fluoro+methy.benzo.c acid, A -DL- 
Propamine, 3-Crri fl uoromemy.)bmync acid, ,-Piperidinepropion.c acid, N- 
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Acetylproline, 3,5-Difluorobenzoic acid, Ac-L-Val-OH, Indole-2-carboxylic acid, 2- 
Benzofurancarboxylic acid, Benzotriazole-5-carboxylic acid, 4-n-Propylbenzoic acid, 3- 
Dimethylaminobenzoic acid, 4-Ethoxybenzoic acid, 4-(MethyIthio)benzoic acid, N (2- 
Furoyl)glycine, 2-(Methylthio)nicotinic acid, 3-Fluoro-4-methoxybenzoic acid, Tfa- 
Gly-OH, 2-Napthoic acid, Quinaldic acid, Ac-L-Ile-OH, 3-Methylindene-2-carboxylic 
acid, 2-Quinoxalinecarboxylic acid, l-MethylindoJe-2-carboxylic acid. 2,3.6- 
Trifluorobenzoic acid, N-Formyl-L-Met-OH, 2-[2^2-Methoxyethoxy)ethoxyJacetic 
acid, 4-„-Butylbenzoic acid, N-Benzoylglycine, 5-Fluo ro indole-2-carboxylic acid, 4-n- 
Propoxybenzoic acid, 4-Acetyl-3,5-dimethyl-2-py^olecarboxylic acid, 3,5- 
Dimethoxybenzoic acid, 2,6-Dimethoxynicotinic acid, Cyciohexanepentanoic acid, 2- 
Naphthylacetic acid, 4-(lH-Py^ol-l-yl)benzoic acid. Indole-3-propionic acid, m- 
Trifluoromethylbenzoic acid, 5-Methoxyi„dole-2-carboxyJic acid, 4-Penty I benzoic acid, 
Bz-b-Ala-OH, 4-Diethylaminobenzoic acid, 4-n-Butoxybenzoic acid, 3-Methyl-5-CF3- 
isoxazole-4-carboxylic acid, (3,4-DimethoxyphenyDacetic acid, 4-Biphenylcarboxylic 
acid, Pivaloyl-Pro-OH, Octanoyl-GIy-OH, (2-Naphthoxy)acetic acid, Indole-3-butyric 
acid, 4-(Trifluoromethyl)phenylacetic acid, 5-Methoxyindole-3-acetic acid, 4- 
(Trifluoromethoxy)benzoic acid, Ac-L-Phe-OH, 4-Pentyloxybenzoic acid, Z-Gly-OH. 
4-Carboxy-N-<fur-2.yimethyl)pyi ro lidin-2-one, 3,4-Diethoxybenzoic acid, 2,4 
Dimethyl.5-C0 2 Et-pyrrole-3-carboxylic acid, N-(2-Fluorophenyl) S uccinamic acid. 
3,4,5-Trimethoxybenzoic acid, N-Phenylanthranilic acid, 3 -Phenoxy benzoic acid, 
Nonanoyl-Gly-OH, 2-Phenoxypyridine-3^arboxylic acid, 2,5-Dimethyl-l- 
phenylpynx,le-3-carboxyiic acid, trans-4-(Trifluorometbyl)cinnamic acid, (5-Methyl-2- 
phenyloxazol-4-yl)acetic acid, 4-(2-Cyclohexenyloxy)benzoic acid, 5-Methoxy-2- 
methylindole-3-acetic acid, trans^-Cotininecarboxylic acid, Bz-5-Aminovaleric acid, 4- 
Hexyloxybenzoic acid, N-0-Methoxyphenyl)succinamic acid, Z-Sar-OH, 4-<3,4- 
Dimethoxyphenyl)butyric acid, Ac-o-Fluoro-DL-Phe-OH. N-(4- 

FluorophenyOglutaramic acid, 4'-Ethyl-4-biphenylcarboxylic acid, 1,2,3,4- 
Tetrahydroacridinecarboxylic acid, 3-Phenoxyphenylacetic acid. N-(2,4- 
DifluorophenyDsuccinamic acid, N-Decanoyl-Gly-OH, (+)-6-Methoxy-a-methyl-2- 
naphthaleneacetic acid, 3-(Trifluoromethoxy)cinnamic acid, N-Formyl-DL-Trp-OH r 
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(RH+)-a-Methoxy-a-(trifluoromethyl)phenylacetic acid, Bz-DL-Leu-OH, 4- 
(Trifluoromethoxy)phenoxyacetic acid, 4-Heptyloxybenzoic acid, 2,3,4- 
Trimethoxycinnamic acid, 2,6-Dimethoxybenzoyl-Gly-OH, 3-(3,4,5- 

TrimethoxyphenyDpropionic acid, 2.3,4,5,6-Pentafluorophenoxyacetic acid, N-(2,4- 
5 Difluorophenyl)glutaramic acid, N-Undecanoyl-Gly-OH, 2-(4-Fluorobenzoy!)benzoic 
acid, 5-Trifluoromethoxyindole-2-carboxylic acid, N-(2,4-Difluorophenyl)diglycolamic 
acid, Ac-L-Trp-OH, Tfa-L-Phenylglycine-OH, 3-Iodobenzoic acid, 3-(4-n- 
Pentylbenzoyl)propionic acid, 2-Phenyl-4-quinolinecarboxylic acid, 4-Octyloxybenzoic 
acid, Bz-L-Met-OH, 3,4,5-Triethoxybenzoic acid, N-Lauroyl-Gly-OH, 3,5- 
10 Bis(trifluoromethyl)benzoic acid, Ac-5-Methyl-DL-Trp-OH. 2-Iodophenylacetic acid, 
3-Iodo-4-methylbenzoic acid, 3-(4-n-Hexylbenzoyl)propionic acid, N-Hexanoyl-L-Phe- 
OH, 4-Nonyloxybenzoic acid, 4'-(Trifluoromethyl)-2-biphenylcarboxylic acid, Bz-L- 
Phe-OH, N-Tridecanoyl-Gly-OH, 3,5-Bis(trifluoromethyl)phenylacetic acid, 3-(4-n- 
Heptylbenzoyl)propionic acid, N-Hepytanoyl-L-Phe-OH, 4-Decyloxybenzoic acid, N- 
15 (a,a,a-trifluoro-m-tolyl)anthranilic acid, Niflumic acid, 4-(2- 
Hydroxyhexafluoroisopropyl)benzoic acid, N-Myristoyl-Gly-OH, 3-(4-n- 
Octylbenzoyl)propionic acid, N-Octanoyl-L-Phe-OH, 4-Undecyloxybenzoic acid, 3- 
(3,4,5-Trimethoxyphenyl)propionyl-Gly-OH, 8-Iodonaphthoic acid, N-Pentadecanoyl- 
Gly-OH, 4-Dodecyloxybenzoic acid, N-Palmitoyl-Gly-OH, and N-Stearoyl-Gly-OH. 
20 These organic acids are available from one or more of Advanced ChemTech, Louisville, 
KY; Bachem Bioscience Inc., Torrance, CA; Calbiochem-Novabiochem Corp., San 
Diego, CA; Farchan Laboratories Inc., Gainesville FL; Lancaster Synthesis, Windham 
NH; and MayBridge Chemical Company (c/o Ryan Scientific), Columbia, SC. The 
catalogs from these companies use the abreviations which are used above to identify the 
25 acids. 

f. Combinatorial Chemistry as a Means for Preparing Tags 
Combinatorial chemistry is a type of synthetic strategy which leads to 
the production of large chemical libraries (see, for example, PCT Application 
30 Publication No. WO 94/08051). These combinatorial libraries can be used as tags for 
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15 



20 



the identification of molecules of interest (MOls). Combinatorial chemistry may be 
defined as the systematic and repetitive, covalent connection of a set of different 
"building blocks" of varying structures to each other to yield a large array of diverse 
molecular entities. Building blocks can take many forms, both naturally occurring and 
synthetic, such as nucleophiles, electrophiles, dienes, alkylating or acylating agents, 
diamines, nucleotides, amino acids, sugars, lipids, organic monomers, synthons, and 
combinations of the above. Chemical reactions used to connect the building blocks 
may involve alkylation, acylation, oxidation, reduction, hydrolysis, substitution, 
elimination, addition, cyclization, condensation, and the like. This process can produce 
libraries of compounds which are oligomeric, non-oligomeric, or combinations thereof. 
If oligomeric, the compounds can be branched, unbranched, or cyclic. Examples of 
oligomeric structures which can be prepared by combinatorial methods include 
oligopeptides, oligonucleotides, oligosaccharides, polylipids, polyesters, polyamides, 
polyurethanes, polyureas, polyethers, poly(phosphorus derivatives), e.g., phosphates, 
phosphonates, phosphoramides, phosphonamides, phosphites, phosphinamides, etc.. and < 
poly(sulfur derivatives), e.g., sulfones, sulfonates, sulfites, sulfonamides, sulfenamides, 



etc. 



One common type of oligomeric combinatorial library is the peptide 
combinatorial library. Recent innovations in peptide chemistry and molecular biology 
have enabled libraries consisting of tens to hundreds of millions of different peptide 
sequences to be prepared and used. Such libraries can be divided into three broad 
categories. One category of libraries involves the chemical synthesis of soluble non- 
support-bound peptide libraries (e.g., Houghten et al., Nature 554:84, 1991). A second 
category involves the chemical synthesis of support-bound peptide libraries, presented 
25 on solid supports such as plastic pins, resin beads, or cotton (Geysen etal., Mol. 
Immunol. 23:709, 1986; Lam etal., Nature 354:82, 1991; Eichler and Houghten. 
Biochemistry J2.11035, 1993). In these first two categories, the building blocks are 
typically L-amino acids, D-amino acids, unnatural amino acids, or some mixture or 
combination thereof. A third category uses molecular biology approaches to prepare 
peptides or proteins on the surface of filamentous phage particles or plasmids (Scott and 
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Craig, Curr. Opinion Biotech. 5:40, 1994). Soluble, nonsupport-bound peptide libraries 
appear to be suitable for a number of applications, including use as tags. The available 
repertoire of chemical diversities in peptide libraries can be expanded by steps such as 
permethylation (Ostresh et al., Proc. Natl. Acad. Sci., USA 91 : 1 1 1 3 8, 1 994). 

5 Numerous variants of peptide combinatorial libraries are possible in 

which the peptide backbone is modified, and/or the amide bonds have been replaced by 
mimetic groups. Amide mimetic groups which may be used include ureas, urethanes, 
and carbonylmethylene groups. Restructuring the backbone such that sidechains 
emanate from the amide nitrogens of each amino acid, rather than the alpha-carbons, 

10 gives libraries of compounds known as peptoids (Simon etal., Proc. Natl. Acad. Sci.. 

USA 89:9367, 1992). 

Another common type of oligomeric combinatorial library is the 
oligonucleotide combinatorial library, where the building blocks are some form of 
naturally occurring or unnatural nucleotide or polysaccharide derivatives, including 
15 where various organic and inorganic groups may substitute for the phosphate linkage, 
and nitrogen or sulfur may substitute for oxygen in an ether linkage (Schneider et al., 
Biochem. 34:9599, 1995; Freier etal., J Med. Chem. 38344, 1995; Frank, J. 
Biotechnology 41:259, 1995; Schneider et al., Published PCT WO 942052; Ecker et al.. 
Nucleic Acids Res. 27:1853, 1993). 
20 More recently, the combinatorial production of collections of non- 

oligomeric, small molecule compounds has been described (DeWitt et al., Proc. Natl. 
Acad. Sci.. USA 90:690, 1993; Bunin et al., Proc. Natl. Acad. Sci., USA 91 :4708, 1994). 
Structures suitable for elaboration into small-molecule libraries encompass a wide 
variety of organic molecules, for example heterocyclics, aromatics, alicyclics, 
25 aliphatics, steroids, antibiotics, enzyme inhibitors, ligands, hormones, drugs, alkaloids, 
opioids, terpenes, porphyrins, toxins, catalysts, as well as combinations thereof. 

g. Specific Methods for Combinatorial Synthesis of Tags 
Two methods for the preparation and use of a diverse set of amine- 
30 containing MS tags are outlined below. In both methods, solid phase synthesis is 
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employed to enable simultaneous parallel synthesis of a large number of tagged linkers, 
using the techniques of combinatorial chemistry. In the first method, the eventual 
cleavage of the tag from the oligonucleotide results in liberation of a carboxyl amide. 
In the second method, cleavage of the tag produces a carboxylic acid. The chemical 
5 components and linking elements used in these methods are abbreviated as follows: 



10 



15 



R 

FMOC 
AH 
C0 2 H 
CONHj 
NH 2 
OH 
CONH 
COO 

NH 2 - Rink - C0 2 H 

OH - lMeO-C0 2 H 
OH - 2MeO - CO^ 
NH 2 -A-COOH 

Xl...JCn-COOH 

oligol... oligo(n) 
HBTU 



= resin 

fluorenylmethoxycarbonyl protecting group 

= allyl protecting group 

= carboxylic acid group 

= carboxylic amide group 

— amino group 

- hydroxyl group 
= amide linkage 
= ester linkage 

4-[(a-amino)-2,4-dimethoxybenzyl]- phenoxybutyric 
acid (Rink linker) 

(4-hydroxymethyl)phenoxybutyricacid 
(4-hydroxymethyl-3-methoxy)phenoxyaceuc acid 
= amino acid with aliphatic or aromatic amine 
functionality in side chain 
set of n diverse carboxylic acids with unique 
molecular weights 
= set of n oligonucleotides 

O-benzotriazol-1 -yl-N,N,N\N*-tetramewyluronium 
hexafluorophosphate 

The sequence of steps in Method 1 is as follows: 
OH - 2MeO - CONH - R 

I FMOC - NH - Rink - C0 2 H; couple (e.g., HBTU) 
FMOC - NH - Rink - COO - 2MeO - CONH - R 

I piperidine (remove FMOC) 
NH 2 - Rink - COO - 2MeO - CONH - R 
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i FMOC - NH - A - COOH; couple {e.g. , HBTU) 
FMOC - NH - A - CONH - Rink - COO - 2MeO - CONH - R 

5 

i piperidine (remove FMOC) 

NH 2 - A - CONH - Rink - COO - 2MeO - CONH - R 

1 0 i divide into n aliquots 

4>1>1>14 couple to n different acids XL... Xn - COOH 

XI Xn - CONH - A - CONH - Rink - COO- 2MeO - CONH - R 

1 5 4,^444 Cleave tagged linkers from resin with 1% TFA 

XI Xn - CONH - A -CONH - Rink - C0 2 H 

4,4,4,4,4 couple to n oligos (oligol oligo(n)) 

20 (e.g., via Pfp esters) 

XI Xn - CONH - A - CONH - Rink - CONH - oligol oligo(n) 

I pool tagged oligos 

25 I perform sequencing reaction 

I separate different length fragments from 

sequencing reaction (e.g., via HPLC or CE) 
I cleave tags from linkers with 25%- 1 00% TFA 

30 XI Xn - CONH - A - CONH 

I 

analyze by mass spectrometry 

35 

The sequence of steps in Method 2 is as follows: 
OH - lMeO-COj- All 

40 

4 FMOC - NH - A - C0 2 H; couple (e.g., HBTU) 
FMOC - NH - A - COO - lMeO - C0 2 - All 
45 I Palladium (remove Allyl) 
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FMOC - NH - A - COO - 1 MeO - C0 2 H 

4 OH - 2MeO - CONH - R; couple (e.g. , HBTU) 
FMOC - NH - A - COO - lMeO - COO - 2MeO - CONH - R 
4 piperidine (remove FMOC) 
10 NH 2 - A - COO - 1 MeO - COO - 2MeO - CONH - R 

4 divide into n aliquots 
444444 couple to n different acids XI Xn - C0 2 H 

15 Xl Xn - CONH - A - COO - lMeO - COO - 2MeO - CONH - R 

^ ^ cleave tagged linkers from resin with 1 % TFA 

XI Xn - CONH - A - COO - lMeO - CO,H 



20 



25 



44444 couple to n oligos (oligo 1 oligo(n)) 

(e.g., via Pfp esters) 

X1 Xn - CONH - A - COO - lMeO - CONH - oligol oligo(n) 

4 pool tagged oligos 

4 perform sequencing reaction 

4 separate different length fragments from 
30 , sequencing reaction (e.g. , via HPLC or CE) 

* cleave tags from linkers with 25- 1 00% TFA 

XI Xn - CONH - A - C0 2 H 

4 



35 



analyze by mass spectrometry 



2. Linkers 

A "linker" component (or L), as used herein, means either a direct 
40 covalent bond or an organic chemical group which is used to connect a "tag" (or T) to a 
"molecule of interest" (or MOI) through covalent chemical bonds. In addition, the 
direct bond itself, or one or more bonds within the linker component is cleavable under 
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conditions which allows T to be released (in other words, cleaved) from the remainder 
of the T-L-X compound (including the MOI component). The tag variable component 
which is present within T should be stable to the cleavage conditions. Preferably, the 
cleavage can be accomplished rapidly; within a few minutes and preferably within 

5 about 1 5 seconds or less. 

In general, a linker is used to connect each of a large set of tags to each 
of a similarly large set of MOls. Typically, a single tag-linker combination is attached 
to each MOI (to give various T-L-MOI), but in some cases, more than one tag-linker 
combination may be attached to each individual MOI (to give various (T-L)n-MOI). In 
10 another embodiment of the present invention, two or more tags are bonded to a single 
linker through multiple, independent sites on the linker, and this multiple tag-linker 
combination is then bonded to an individual MOI (to give various (T)n-L-MOI). 

After various manipulations of the set of tagged MOls, special chemical 
and/or physical conditions are used to cleave one or more covalent bonds in the linker, 
15 resulting in the liberation of the tags from the MOls. The cleavable bond(s) may or 
may not be some of the same bonds that were formed when the tag, linker, and MOI 
were connected together. The design of the linker will, in large part, determine the 
conditions under which cleavage may be accomplished. Accordingly, linkers may be 
identified by the cleavage conditions they are particularly susceptible too. When a 
20 linker is photolabile (i.e., prone to cleavage by exposure to actinic radiation), the linker 
may be given the designation L ho . Likewise, the designations L ac,d , L base , L 101 , L [R] , 
L 0 " 2 , L elc , L A and L ss may be used to refer to linkers that are particularly susceptible to 
cleavage by acid, base, chemical oxidation, chemical reduction, the catalytic activity of 
an enzyme (more simply "enzyme"), electrochemical oxidation or reduction, elevated 
25 temperature ("thermal") and thiol exchange, respectively. 

Certain types of linker are labile to a single type of cleavage condition, 
whereas others are labile to several types of cleavage conditions. In addition, in linkers 
which are capable of bonding multiple tags (to give (T)n-L-MOI type structures), each 
of the tag-bonding sites may be labile to different cleavage conditions. For example, in 
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a linker having two tags bonded to it, one of the tags may be labile only to base, and the 
other labile only to photolysis. 

A linker which is useful in the present invention possesses several 

attributes: 

5 1) The linker possesses a chemical handle (L h ) through which it can be 

attached to an MOI. 

2) The linker possesses a second, separate chemical handle (L h ) through 
which the tag is attached to the linker. If multiple tags are attached to a single linker 
((T)n-L-MOI type structures), then a separate handle exists for each tag. 

10 3)The linker is stable tow ard all manipulations to which it is subjected 

with the exception of the conditions which allow cleavage such that a T-containing 
moiety is released from the remainder of the compound, including the MOI. Thus, the 
linker is stable during attachment of the tag to the linker, attachment of the linker to the 

MOI, and any manipulations of the MOI while the tag and linker (T-L) are attached to 
1 5 it. 

4) The linker does not significantly interfere with the manipulations 
performed on the MOI while the T-L is attached to it. For instance, if the T-L is 
attached to an oligonucleotide, the T-L must not significantly interfere with any 
hybridization or enzymatic reactions (e.g., PGR) performed on the oligonucleotide 

20 Similarly, if the T-L is attached to an antibody, it must not significantly interfere with 
antigen recognition by the antibody. 

5) Cleavage of the tag from the remainder of the compound occurs in a 
highly controlled manner, using physical or chemical processes that do not adversely 
affect the detectability of the tag. 

25 F ° r My given linker ' il is Preferred that the linker be attachable to a wide 

variety of MOIs, and that a wide variety of tags be attachable to the linker. Such 
flexibility is advantageous because it allows a library of T-L conjugates, once prepared, 
to be used with several different sets of MOIs. 

As explained above, a preferred linker has the formula 

30 
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L h -L'-L 2 -L 3 -L h 

wherein each L h is a reactive handle that can be used to link the linker to a tag reactant 
and a molecule of interest reactant. V is an essential part of the linker, because L 2 
5 imparts lability to the linker. L 1 and L 3 are optional groups which effectively serve to 
separate L 2 from the handles L h . 

L 1 (which, by definition, is nearer to T than is L 3 ), serves to separate T 
from the required labile moiety L 2 . This separation may be useful when the cleavage 
reaction generates particularly reactive species (e.g., free radicals) which may cause 

10 random changes in the structure of the T-containing moiety. As the cleavage site is 
further separated from the T-containing moiety, there is a reduced likelihood that 
reactive species formed at the cleavage site will disrupt the structure of the T-containing 
moiety. Also, as the atoms in LI will typically be present in the T-containing moiety, 
these L l atoms may impart a desirable quality to the T-containing moiety. For example, 

15 where the T-containing moiety is a T^-containing moiety, and a hindered amine is 
desirably present as part of the structure of the T^-eontaining moiety (to serve, e.g., as 
a MSSE), the hindered amine may be present in L 1 labile moiety. 

In other instances, L 1 and/or L 3 may be present in a linker component 
merely because the commercial supplier of a linker chooses to sell the linker in a form 

20 having such a L 1 and/or L 3 group. In such an instance, there is no harm in using linkers 
having L 1 and/or L 3 groups, (so long as these group do not inhibit the cleavage reaction) 
even though they may not contribute any particular performance advantage to the 
compounds that incorporate them. Thus, the present invention allows for L 1 and/or L 3 
groups to be present in the linker component. 

25 L 1 and/or L 3 groups may be a direct bond (in which case the group is 

effectively not present), a hydrocarbylene group (e.g., alkylene, arylene, cycloalkylene, 
etc.), -O-hydrocarbylene (e.g., -0-CH 2 -, 0-CH 2 CH(CH 3 K etc.) or hydrocarbylene-(0 
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hydrocarbylene) w - wherein w is an integer ranging from 1 to about 10 (e.g., -CH,-0-Ar- 
, -CH 2 -(0-CH 2 CH2)4-, etc.). 

With the advent of solid phase synthesis, a great body of literature has 
developed regarding linkers that are labile to specific reaction conditions. In typical 
solid phase synthesis, a solid support is bonded through a labile linker to a reactive site, 
and a molecule to be synthesized is generated at the reactive site. When the molecule 
has been completely synthesized, the solid support-linker-molecule construct is 
subjected to cleavage conditions which releases the molecule from the solid support. 
The labile linkers which have been developed for use in this context (or which may be 
used in this context) may also be readily used as the linker reactant in the present 
invention. 

Lloyd- Williams, P., et al., "Convergent Solid-Phase Peptide Synthesis", 
Tetrahedron Report No. 347, 49(4S):l 1065-1 1133 (1993) provides an extensive 
discussion of linkers which are labile to actinic radiation (i.e., photolysis), as well as 
acid, base and other cleavage conditions. Additional sources of information about labile 
linkers are well known in the art. 

As described above, different linker designs will confer cleavability 
("lability") under different specific physical or chemical conditions. Examples of 
conditions which serve to cleave various designs of linker include acid, base, oxidation, 
reduction, fluoride, thiol exchange, photolysis, and enzymatic conditions. 

Examples of cleavable linkers that satisfy the general criteria for linkers 
listed above will be well known to those in the art and include those found in the 
catalog available from Pierce (Rockford, IL). Examples include: 

• ethylene glycobis(succinimidylsuccinate) (EGS), an amine reactive 
cross-linking reagent which is cleavable by hydroxy lamine (1 M at 37°C 
for 3-6 hours); 

• disuccinimidyl tartarate (DST) and sulfo-DST, which are amine reactive 
cross-linking reagents, cleavable by 0.015 M sodium periodate; 
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• bis[2-(succinimidyloxycarbonyloxy)ethyl]sulfone (BSOCOES) and 
sulfo-BSOCOES, which are amine reactive cross-linking reagents, 
cleavable by base (pH 1 1 .6); 

• 1 5 4-di-[3 , -(2'-pyridyldithio(propionamido))butane (DPDPB), a 
5 pyridyldithiol crosslinker which is cleavable by thiol exchange or 

reduction; 

• N-[4-(p-azidosalicylamido)-butyl]-3 t -(2 , -pyridydithio)propionamide 
(APDP), a pyridyldithiol crosslinker which is cleavable by thiol 
exchange or reduction; 

10 • bis-[beta-4~(azidosalicylamido)ethyl]-disuifide, a photoreactive 

crosslinker which is cleavable by thiol exchange or reduction; 

• N-succinimidyl-(4-azidophenyl)-l ,3 f dithiopropionate (SADP), a 
photoreactive crosslinker which is cleavable by thiol exchange or 
reduction; 

15 • sulfosuccinimidy l-2-(7-azido-4-methylcoumarin-3-acetamide)ethyl- 1 ,3'- 

dithiopropionate (SAED), a photoreactive crosslinker which is cleavable 
by thiol exchange or reduction; 

• sulfosuccinimidyl-2-(m-azido-o-nitrobenzamido)-ethyl- 

1 ,3'dithiopropionate (SAND), a photoreactive crosslinker which is 
20 cleavable by thiol exchange or reduction. 

Other examples of cleavable linkers and the cleavage conditions that can 
be used to release tags are as follows. A silyl linking group can be cleaved by fluoride 
or under acidic conditions. A 3-, 4-, 5-, or 6-substituted-2-nitrobenzyloxy or 2-, 3-, 5-, 
or 6-substituted-4-nitrobenzyloxy linking group can be cleaved by a photon source 
25 (photolysis). A 3-, 4-, 5-, or 6-substituted-2-alkoxyphenoxy or 2-, 3-, 5-, or 6- 
substituted-4-alkoxyphenoxy linking group can be cleaved by Ce(NH 4 ) 2 (N0 3 ) 6 
(oxidation). A NC0 2 (urethane) linker can be cleaved by hydroxide (base), acid, or 
LiAlH 4 (reduction). A 3-pentenyl, 2-butenyl, or 1-butenyi linking group can be cleaved 
by 0 3 , O s 0 4 /I0 4 \ or KMn0 4 (oxidation). A 2-[3-, 4-, or 5-substituted-furyl]oxy linking 
30 group can be cleaved by 0 2 , Br 2 , MeOH, or acid. 
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Conditions for the cleavage of other labile linking groups include: 
t-alkyloxy linking groups can be cleaved by acid; methyl(dialkyl)methoxy or 4- 
substituted-2-alkyl-l,3-dioxlane-2-yl linking groups can be cleaved by H 3 CT; 
2-silylethoxy linking groups can be cleaved by fluoride or acid; 2-(X>ethoxy (where 
X= keto, ester amide, cyano, N0 2 , sulfide, sulfoxide, sulfone) linking groups can be 
cleaved under alkaline conditions; 2-, 3-, 4-, 5-, or 6-substituted-benzyloxy linking 
groups can be cleaved by acid or under reductive conditions; 2-butenyloxy linking 
groups can be cleaved by (Ph 3 P) 3 RhCl(H), 3-, 4-, 5-, or 6-substituted-2-bromo P henoxy 
linking groups can be cleaved by Li, Mg, or BuLi; methylthiomethoxy linking groups 
can be cleaved by Hg 2+ ; 2-(X)-ethyloxy (where X = a halogen) linking groups can be 
cleaved by Zn or Mg; 2-hydroxyethyloxy linking groups can be cleaved by oxidation 
(e.g., with Pb(OAc) 4 ). 

Preferred linkers are those that are cleaved by acid or photolysis. Several 
of the acid-labile linkers that have been developed for solid phase peptide synthesis are 
useful for linking tags to MOIs. Some of these linkers are described in a recent review 
by Lloyd-Williams etal. (Tetrahedron 49:1 1065-1 1 133, 1993). One useful type of 
linker is based upon p-alkoxybenzyl alcohols, of which two, 4- 
hydroxymethylphenoxyacetic acid and 4-(4-hydroxymethyl-3-methoxyphenoxy)butyric 
acid, are commercially available from Advanced ChemTech (Louisville, KY). Both 
linkers can be attached to a tag via an ester linkage to the benzylalcohol. and to an 
amine-containing MOI via an amide linkage to the carboxylic acid. Tags linked by 
these molecules are released from the MOI with varying concentrations of 
trifluoroacetic acid. The cleavage of these linkers results in the liberation of a 
carboxylic acid on the tag. Acid cleavage of tags attached through related linkers, such 
as 2 ? 4-dimethoxy-4'-(carboxymethyloxy>benzhydrylamine (available from Advanced 
ChemTech in FMOC-protected form), results in liberation of a carboxylic amide on the 
released tag. 

The photolabile linkers useful for this application have also been for the 
most part developed for solid phase peptide synthesis (see Lloyd-Williams review). 
These linkers are usually based on 2-nitrobenzylesters or 2-nitrobenzylamides. Two 
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examples of photolabile linkers that have recently been reported in the literature are 4- 
(4-(l-Fmoc-amino)ethyl)-2-methoxy-5-nitrophenoxy)butanoic acid (Holmes and Jones, 
j 0rg , chem. 60:2318-2319, 1995) and 3-(Fmoc-amino)-3-(2-nitrophenyl)propionic 
acid (Brown et al., Molecular Diversity 7:4-12, 1995). Both linkers can be attached via 
; the carboxylic acid to an amine on the MOI. The attachment of the tag to the linker is 
made by forming an amide between a carboxylic acid on the tag and the amine on the 
linker. Cleavage of photolabile linkers is usually performed with UV light of 350 nm 
wavelength at intensities and times known to those in the art. Cleavage of the linkers 
results in liberation of a primary amide on the tag. Examples of photocleavable linkers 
include nitrophenyl glycine esters, exo- and endo-2-benzonorborneyl chlorides and 
methane sulfonates, and 3-amino-3(2-nitrophenyl) propionic acid. Examples of 
enzymatic cleavage include esterases which will cleave ester bonds, nucleases which 
will cleave phosphodiester bonds, proteases which cleave peptide bonds, etc. 



15 



A preferred linker component has an ortho-nitrobenzyl structure as 

shown below: 

d 




wherein one carbon atom at positions a, b, c, d or e is substituted with -L 5 -X, and L 1 
(which is preferably a direct bond) is present to the left of N(R') in the above structure. 
20 Such a linker component is susceptible to selective photo-induced cleavage of the bond 
between the carbon labeled "a" and N(R'). The identity of R' is not typically critical to 
the cleavage reaction, however R' is preferably selected from hydrogen and 
hydrocarbyl. The present invention provides that in the above structure, -N(R')- could 
be replaced with -O-. Also in the above structure, one or more of positions b, c. d or e 
25 may optionally be substituted with alkyl, alkoxy, fluoride, chloride, hydroxyl. 
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carboxylate or amide, where these substituents are independently selected at each 



occurrence. 



A further preferred linker component with a chemical handle L h has the 
following structure: 




wherein one or more of positions b, c, d or e is substituted with hydrogen, alkyl, alkoxy, 
fluoride, chloride, hydroxyl, carboxylate or amide, R' is hydrogen or hydrocarbyl, and 
R 2 is -OH or a group that either protects or activates a carboxylic acid for coupling with 
another moiety. Fluorocarbon and hydrofluorocarbon groups are preferred groups that 
1 0 activate a carboxylic acid toward coupling with another moiety. 

3- Molecule of Interest ( MOT) 

Examples of MOIs include nucleic acids or nucleic acid analogues (e.g. , 
1 5 PNA), fragments of nucleic acids (i.e., nucleic acid fragments), synthetic nucleic acids 
or fragments, oligonucleotides (e.g., DNA or RNA), proteins, peptides, antibodies or 
antibody fragments, receptors, receptor ligands, members of a ligand pair, cytokines, 
hormones, oligosaccharides, synthetic organic molecules, drugs, and combinations 
thereof. 

20 Preferred MOIs include nucleic acid fragments. Preferred nucleic acid 

fragments are primer sequences that are complementary to sequences present in vectors, 
where the vectors are used for base sequencing. Preferably a nucleic acid fragment is 
attached directly or indirectly to a tag at other than the 3' end of the fragment; and most 
preferably at the 5' end of the fragment. Nucleic acid fragments may be purchased or 

25 prepared based upon genetic databases (e.g., Dib et al., Nature 550:152-154, 1996 and 
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CEPH Genotype Database, http://www.cephb.fr) and commercial vendors (e.g., 
Promega, Madison, WI). 

As used herein, MOI includes derivatives of an MOl that contain 
functionality useful in joining the MOI to a T-L-L h compound. For example, a nucleic 
5 acid fragment that has a phosphodiester at the 5' end, where the phosphodiester is also 
bonded to an alkyleneamine, is an MOI. Such an MOI is described in, e.g., U.S. Patent 
4,762,779 which is incorporated herein by reference. A nucleic acid fragment with an 
internal modification is also an MOI. An exemplary internal modification of a nucleic 
acid fragment is where the base (e.g., adenine, guanine, cytosine, thymidine, uracil) has 

10 been modified to add a reactive functional group. Such internally modified nucleic acid 
fragments are commercially available from, e.g., Glen Research, Herndon, VA. 
Another exemplary internal modification of a nucleic acid fragment is where an abasic 
phosphoramidate is used to synthesize a modified phosphodiester which is interposed 
between a sugar and phosphate group of a nucleic acid fragment. The abasic 

15 phosphoramidate contains a reactive group which allows a nucleic acid fragment that 
contains this phosphoramidate-derived moiety to be joined to another moiety, e.g., a T- 
L-L h compound. Such abasic phosphoramidates are commercially available from, e.g., 
Clonetech Laboratories, Inc., Palo Alto, CA. 

4. Chemical Handles (L h ) 

20 

A chemical handle is a stable yet reactive atomic arrangement present as 
part of a first molecule, where the handle can undergo chemical reaction with a 
complementary chemical handle present as part of a second molecule, so as to form a 
covalent bond between the two molecules. For example, the chemical handle may be a 

25 hydroxyl group, and the complementary chemical handle may be a carboxylic acid 
group (or an activated derivative thereof, e.g., a hydrofluroaryl ester), whereupon 
reaction between these two handles forms a covalent bond (specifically, an ester group) 
that joins the two molecules together. 

Chemical handles may be used in a large number of covalent bond- 

30 forming reactions that are suitable for attaching tags to linkers, and linkers to MOIs. 
Such reactions include alkylation {e.g., to form ethers, thioethers), acylation (e.g. to 
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form esters, amides, carbamates, ureas, thioureas), phosphorylation (e.g., to form 
phosphates, phosphorates, phosphoramides, phosphonamides), sulfonylation (e.g., to 
form sulfonates, sulfonamides), condensation (e.g., to form imines, oximes, 
hydrazones), silylation, disulfide formation, and generation of reactive intermediates, 
5 such as nitrenes or carbenes, by photolysis. In general, handles and bond-forming 
reactions which are suitable for attaching tags to linkers are also suitable for attaching 
linkers to MOIs, and vice-versa. In some cases, the MOI may undergo prior 
modification or derivitization to provide the handle needed for attaching the linker. 

One type of bond especially useful for attaching linkers to MOIs is the 
10 disulfide bond. Its formation requires the presence of a thiol group ("handle") on the 
linker, and another thiol group on the MOI. Mild oxidizing conditions then suffice to 
bond the two thiols together as a disulfide. Disulfide formation can also be induced by 
using an excess of an appropriate disulfide exchange reagent, e.g., pyridyl disulfides. 
Because disulfide formation is readily reversible, the disulfide may also be used as the 
15 cleavable bond for liberating the tag, if desired. This is typically accomplished under 
similarly mild conditions, using an excess of an appropriate thiol exchange reagent, e.g., 
dithiothreitol. 

Of particular interest for linking tags (or tags with linkers) to 
oligonucleotides is the formation of amide bonds. Primary aliphatic amine handles can 

20 be readily introduced onto synthetic oligonucleotides with phosphoramidites such as 6- 
monomethoxytritylhexylcyanoethyl-N,N-diisopropyl phosphoramidite (available from 
Glenn Research, Sterling, VA). The amines found on natural nucleotides such as 
adenosine and guanosine are virtually unreactive when compared to the introduced 
primary amine. This difference in reactivity forms the basis of the ability to selectively 

25 form amides and related bonding groups (e.g., ureas, thioureas, sulfonamides) with the 
introduced primary amine, and not the nucleotide amines. 

As listed in the Molecular Probes catalog (Eugene, OR), a partial 
enumeration of amine-reactive functional groups includes activated carboxylic esters, 
isocyanates, isothiocyanates, sulfonyl halides, and dichlorotriazenes. Active esters are 

30 excellent reagents for amine modification since the amide products formed are very 
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stable. Also, these reagents have good reactivity with aliphatic amines and low 
reactivity with the nucleotide amines of oligonucleotides. Examples of active esters 
include N-hydroxysuccinimide esters, pentafluorophenyl esters, tetrafluorophenyl 
esters, and p-nitrophenyl esters. Active esters are useful because they can be made from 
5 virtually any molecule that contains a carboxylic acid. Methods to make active esters 
are listed in Bodansky (Principles of Peptide Chemistry (2d ed.), Springer Verlag. 
London, 1993). 

5. Linker Attachment 

10 

Typically, a single type of linker is used to connect a particular set or 
family of tags to a particular set or family of MOIs. In a preferred embodiment of the 
invention, a single, uniform procedure may be followed to create all the various T-L- 
MOI structures. This is especially advantageous when the set of T-L-MOI structures is 

15 large, because it allows the set to be prepared using the methods of combinatorial 
chemistry or other parallel processing technology. In a similar manner, the use of a 
single type of linker allows a single, uniform procedure to be employed for cleaving all 
the various T-L-MOI structures. Again, this is advantageous for a large set of T-L-MOl 
structures, because the set may be processed in a parallel, repetitive, and/or automated 

20 manner. 

There are, however, other embodiment of the present invention, wherein 
two or more types of linker are used to connect different subsets of tags to 
corresponding subsets of MOIs. In this case, selective cleavage conditions may be used 
to cleave each of the linkers independently, without cleaving the linkers present on 
25 other subsets of MOIs. 

A large number of covalent bond-forming reactions are suitable for 
attaching tags to linkers, and linkers to MOIs. Such reactions include alkylation {e.g.. 
to form ethers, thioethers), acylation (e.g., to form esters, amides, carbamates, ureas, 
thioureas), phosphorylation (e.g., to form phosphates, phosphonates, phosphoramides. 
30 phosphonamides), sulfonylation (e.g., to form sulfonates, sulfonamides), condensation 
(e.g., to form imines, oximes, hydrazones), silylation, disulfide formation, and 
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generation of reactive intermediates, such as nitrenes or carbenes, by photolysis. In 
general, handles and bond-forming reactions which are suitable for attaching tags to 
linkers are also suitable for attaching linkers to MOIs, and vice-versa. In some cases, 
the MOI may undergo prior modification or derivitization to provide the handle needed 
for attaching the linker. 

One type of bond especially useful for attaching linkers to MOIs is the 
disulfide bond. Its formation requires the presence of a thiol group ("handle") on the 
linker, and another thiol group on the MOI. Mild oxidizing conditions then suffice to 
bond the two thiols together as a disulfide. Disulfide formation can also be induced by 
using an excess of an appropriate disulfide exchange reagent, e.g., pyridyl disulfides. 
Because disulfide formation is readily reversible, the disulfide may also be used as the 
cleavable bond for liberating the tag, if desired. This is typically accomplished under 
similarly mild conditions, using an excess of an appropriate thiol exchange reagent, e.g. , 
dithiothreitol. 

Of particular interest for linking tags to oligonucleotides is the formation 
of amide bonds. Primary aliphatic amine handles can be readily introduced onto.. 
synthetic oligonucleotides with phosphoramidites such as 6- 
monomemoxytritylhexylcyanoethyl-N,N-diisopropyl phosphoramidite (available from 
Glenn Research, Sterling, VA). The amines found on natural nucleotides such as 
adenosine and guanosine are virtually unreactive when compared to the introduced 
primary amine. This difference in reactivity forms the basis of the ability to selectively 
form amides and related bonding groups (e.g., ureas, thioureas, sulfonamides) with the 
introduced primary amine, and not the nucleotide amines. 

As listed in the Molecular Probes catalog (Eugene, OR), a partial 
enumeration of amine-reactive functional groups includes activated carboxylic esters, 
isocyanates, isothiocyanates, sulfonyl halides, and dichlorotriazenes. Active esters are 
excellent reagents for amine modification since the amide products formed are very 
stable. Also, these reagents have good reactivity with aliphatic amines and low 
reactivity with the nucleotide amines of oligonucleotides. Examples of active esters 
include N-hydroxysuccinimide esters, pentafluorophenyl esters, tetrafluorophenyl 
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esters, and p-nitrophenyl esters. Active esters are useful because they can be made from 
virtually any molecule that contains a carboxylic acid. Methods to make active esters 
are listed in Bodansky {Principles of Peptide Chemistry (2d ed.), Springer Verlag, 
London, 1993). 

5 Numerous commercial cross-linking reagents exist which can serve as 

linkers {e.g., see Pierce Cross-linkers, Pierce Chemical Co., Rockford, IL). Among 
these are homobifunctional amine-reactive cross-linking reagents which are exemplified 
by homobifunctional imidoesters and N-hydroxysuccinimidyl (NHS) esters. There also 
exist heterobifunctional cross-linking reagents possess two or more different reactive 
10 groups that allows for sequential reactions. Imidoesters react rapidly with amines at 
alkaline pH. NHS-esters give stable products when reacted with primary or secondary 
amines. Maleimides, alkyl and aryl halides, alpha-haloacyls and pyridyl disulfides are 
thiol reactive. Maleimides are specific for thiol (sulfhydryl) groups in the pH range of 
6.5 to 7.5, and at alkaline pH can become amine reactive. The thioether linkage is stable 
15 under physiological conditions. Alpha-haloacetyl cross-linking reagents contain the 
iodoacetyl group and are reactive towards sulfhydryls. Imidazoles can react with the 
iodoacetyl moiety, but the reaction is very slow. Pyridyl disulfides react with thiol 
groups to form a disulfide bond. Carbodiimides couple carboxyls to primary amines of 
hydrazides which give rises to the formation of an acyl-hydrazine bond. The arylazides 
20 are photoaffinity reagents which are chemically inert until exposed to UV or visible 
light. When such compounds are photolyzed at 250-460 nm, a reactive aryl nitrene is 
formed. The reactive aryl nitrene is relatively non-specific. Glyoxals are reactive 
towards guanidinyl portion of arginine. 

In one typical embodiment of the present invention, a tag is first bonded 
25 to a linker, then the combination of tag and linker is bonded to a MOl, to create the 
structure T-L-MOI. Alternatively, the same structure is formed by first bonding a linker 
to a MOl, and then bonding the combination of linker and MOI to a tag. An example is 
where the MOl is a DNA primer or oligonucleotide. In that case, the tag is typically 
first bonded to a linker, then the T-L is bonded to a DNA primer or oligonucleotide. 
30 which is then used, for example, in a sequencing reaction. 
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One useful form in which a tag could be reversibly attached to an MOI 
(e.g.. an oligonucleotide or DNA sequencing primer) is through a chemically labile 
linker. One preferred design for the linker allows the linker to be cleaved when exposed 
to a volatile organic acid, for example, trifluoroacetic acid (TFA). TFA in particular is 
compatible with most methods of MS ionization, including electrospray. 

As described in detail below, the invention provides a method for 
determining the sequence of a nucleic acid molecule. A composition which may be 
formed by the inventive method comprises a plurality of compounds of the formula: 



T" s -L-MOI 

wherein 1™ is an organic group detectable by mass spectrometry. T ms 
contains carbon, at least one of hydrogen and fluoride, and may contain optional atoms 
including oxygen, nitrogen, sulfur, phosphorus and iodine. In the formula, L is an 
1 5 organic group which allows a ^containing moiety to be cleaved from the remainder 
of the compound upon exposure of the compound to cleavage condition. The cleaved 
■^'-containing moiet y inc,udes * functional group which supports a single ionized 
charge state when each of the plurality of compounds is subjected to mass spectrometry. 
The functional group may be a tertiary amine, quaternary amine or an organic acid. In 
20 the formula, MOI is a nucleic acid fragment which is conjugated to L via the 5 ' end of 
the MOI. The term "conjugated" means that there may be chemical groups intermediate 
L and the MOI, e.g., a phosphodiester group and/or an alkylene group. The nucleic acid 
fragment may have a sequence complementary to a portion of a vector, wherein the 
fragment is capable of priming nucleotide synthesis. 

In the composition, no two compounds have either the same 1™ or the 
same MOI. In other words, the composition includes a plurality of compounds, wherein 
each compound has both a unique T™ and a unique nucleic acid fragment (unique in 
that it has a unique base sequence). In addition, the composition may be described as 
having a plurality of compounds wherein each compound is defined as having a unique 
30 T™, where the T™ is unique in that no other compound has a T™ that provides the 



.9727331 A2_l_> 



WO 97/27331 



PCT/US97/01304 



66 

same signal by mass spectrometry. The composition therefore contains a plurality of 
compounds, each having a T"* with a unique mass. The composition may also be 
described as having a plurality of compounds wherein each compound is defined as 
having a unique nucleic acid sequence. These nucleic acid sequences are intentionally 
5 unique so that each compound will serve as a primer for only one vector, when the 
composition is combined with vectors for nucleic acid sequencing. The set of 
compounds having unique Tms groups is the same set of compounds which has unique 
nucleic acid sequences. 

Preferably, the T ms groups are unique in that there is at least a 2 amu, 
10 more preferably at least a 3 amu, and still more preferably at least a 4 amu mass 
separation between the I*" 5 groups of any two different compounds. In the 
composition, there are at least 2 different compounds, preferably there are more than 2 
different compounds, and more preferably there are more than 4 different compounds. 
The composition may contain 100 or more different compounds, each compound having 
1 5 a unique T ms and a unique nucleic acid sequence. 

Another composition that is useful in, e.g., determining the sequence of a 
nucleic acid molecule* includes water and a compound of the formula T™ 5 -L-M01, 
wherein T" 5 is an organic group detectable by mass spectrometry. T ms contains carbon, 
at least one of hydrogen and fluoride, and may contain optional atoms including 
20 oxygen, nitrogen, sulfur, phosphorus and iodine. In the formula, L is an organic group 
which allows a T^-containing moiety to be cleaved from the remainder of the 
compound upon exposure of the compound to cleavage condition. The cleaved T ms - 
containing moiety includes a functional group which supports a single ionized charge 
state when each of the plurality of compounds is subjected to mass spectrometry. The 
25 functional group may be a tertiary amine, quaternary amine or an organic acid. In the 
formula, MOI is a nucleic acid fragment attached at its 5' end. 

In addition to water, this composition may contain a buffer, in order to 
maintain the pH of the aqueous composition within the range of about 5 to about 9. 
Furthermore, the composition may contain an enzyme, salts (such as MgCU and NaCI) 
30 and one of dATP, dGTP, dCTP, and dTTP. A preferred composition contains water. 
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T n,s -L-MOI and one (and only one) of ddATP, ddGTP, ddCTP, and ddTTP. Such a 
composition is suitable for use in the dideoxy sequencing method. 

The invention also provides a composition which contains a plurality of 
sets of compounds, wherein each set of compounds has the formula: 

T™ s -L-MOI 

wherein, 

1™ is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine. L is an organic group which allows a 1™- 
containing moiety to be cleaved from the remainder of the compound, wherein the T™- 
containing moiety comprises a functional group which supports a single ionized charge 
state when the compound is subjected to mass spectrometry and is selected from tertiary 
amine, quaternary amine and organic acid. The MOI is a nucleic acid fragment wherein 
L is conjugated to MOI at the MOI's 5' end. 

Within a set, all members have the same 1™ group, and the MOI 
fragments have variable lengths that terminate with the same dideoxynucleotide 
selected from ddAMP, ddGMP, ddCMP and ddTMP; and between sets, the T" groups 
differ by at least 2 amu, preferably by at least 3 amu. The plurality of sets is preferably 
at least 5 and may number 100 or more. 

In a preferred composition comprising a first plurality of sets as 
described above, there is additionally present a second plurality of sets of compounds 
having the formula 

T-'-L-MOI 

wherein T" is an organic group detectable by mass spectrometry, comprising carbon, at 
least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, 
sulfur, phosphorus and iodine. L is an organic group which allows a ^-containing 
moiety to be cleaved from the remainder of the compound, wherein the T^-containing 
moiety comprises a functional group which supports a single ionized charge state when 
the compound is subjected to mass spectrometry and is selected from tertiary amine, 
quaternary amine and organic acid. MOI is a nucleic acid fragment wherein L is 
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conjugated to MOI at the MOI's 5' end. All members within the second plurality have 
an MOI sequence which terminates with the same dideoxynucleotide selected from 
ddAMP. ddGMP, ddCMP and ddTMP; with the proviso that the dideoxynucleotide 
present in the compounds of the first plurality is not the same dideoxynucleotide present 
5 in the compounds of the second plurality. 

The invention also provides a kit for DNA sequencing analysis. The kit 
comprises a plurality of container sets, where each container set includes at least five 
containers. The first container contains a vector. The second, third, fourth and fifth 
containers contain compounds of the formula: 
10 T" s -L-M01 

wherein T™ is an organic group detectable by mass spectrometry, comprising carbon, at 
least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, 
sulfur, phosphorus and iodine. L is an organic group which allows a T^-containing 
moiety to be cleaved from the remainder of the compound, wherein the T'-containing 
1 5 moiety comprises a functional group which supports a single ionized charge state when 
the compound is subjected to mass spectrometry and is selected from tertiary amine, 
quaternary amine and organic acid. MOI is a nucleic acid fragment wherein L is 
conjugated to MOI at the MOI's 5' end. The MOI for the second, third, fourth and fifth 
containers is identical and complementary to a portion of the vector within the set of 
20 containers, and the T~ group within each container is different from the other T™ 
groups in the kit. 

Preferably, within the kit, the plurality is at least 3, i.e., there are at least 
three sets of containers. More preferably, there are at least 5 sets of containers. 

25 As noted above, the present invention provides compositions and 

methods for determining the sequence of nucleic acid molecules. Briefly, such methods 
generally comprise the steps of (a) generating tagged nucleic acid fragments which are 
complementary to a selected nucleic acid molecule (e.g., tagged fragments) from a first 
terminus to a second terminus of a nucleic acid molecule), wherein a tag is correlative 
30 with a particular or selected nucleotide, and may be detected by any of a variety of 
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methods, (b) separating the tagged fragments by sequential length, (c) cleaving a tag 
from a tagged fragment, and (d) detecting the tags, and thereby determining the 
sequence of the nucleic acid molecule. Each of the aspects will be discussed in more 
detail below. 



B- SEQUENCING METHODS AND STRATFP.TFg 

As noted above, the present invention provides methods for determining 
the sequence of a nucleic acid molecule. Briefly, tagged nucleic acid fragments are 
prepared. The nucleic acid fragments are complementary to a selected target nucleic 
acid molecule. In a preferred embodiment, the nucleic acid fragments arc produced 
from a first terminus to a second terminus of a nucleic acid molecule, and more 
preferably from a 5' terminus to a 3' terminus. In other preferred embodiments, the 
tagged fragments are generated from 5 -tagged oligonucleotide primers or tagged 
dideoxynucleotide terminators. A tag of a tagged nucleic acid fragment is correlative 
with a particular nucleotide and is detectable by spectrometry (including fluorescence, 
but preferably other than fluorescence), or by potentiometry. In a preferred 
embodiment, at least five tagged nucleic acid fragments are generated and each tag is 
unique for a nucleic acid fragment. More specifically, the number of tagged fragments 
will generally range from about 5 to 2,000. The tagged nucleic acid fragments may be 
generated from a variety of compounds, including those set forth above. It will be 
evident to one in the art that the methods of the present invention are not limited to use 
only of the representative compounds and compositions described herein. 

Following generation of tagged nucleic acid fragments, the tagged 
fragments are separated by sequential length. Such separation may be performed by a 
variety of techniques. In a preferred embodiment, separation is by liquid 
chromatography (LC) and particularly preferred is HPLC. Next, the tag is cleaved from 
the tagged fragment. The particular method for breaking a bond to release the tag is 
selected based upon the particular type of susceptibility of the bond to cleavage. For 
example, a light-sensitive bond (i.e., one that breaks by light) will be exposed to light. 
The released tag is detected by spectrometry or potentiometry. Preferred detection 
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means are mass spectrometry, infrared spectrometry, ultraviolet spectrometry and 
potentiostatic amperometry (e.g., with an amperometric detector or coulemetric 
detector). 

It will be appreciated by one in the art that one or more of the steps may 
5 be automated, e.g., by use of an instrument. In addition, the separation, cleavage and 
detection steps may be performed in a continuous manner (e.g., continuous 
flow/continuous fluid path of tagged fragments through separation to cleavage to tag 
detection). For example, the various steps may be incorporated into a system, such that 
the steps are performed in a continuous manner. Such a system is typically in an 
10 instrument or combination of instruments format. For example, tagged nucleic acid 
fragments that are separated (e.g., by HPLC) may flow into a device for cleavage (e.g.. 
a photo-reactor) and then into a tag detector (e.g., a mass spectrometer or coulometric or 
amperometric detector). Preferably, the device for cleavage is tunable so that an 
optimum wavelength for the cleavage reaction can be selected. 
15 it will be apparent to one in the art that the methods of the present 

invention for nucleic acid sequencing may be performed for a variety of purposes. For 
example, such use of the present methods include primary sequence determination for 
viral, bacterial, prokaryotic and eukaryotic (e.g., mammalian) nucleic acid molecules; 
mutation detection; diagnostics; forensics; identity; and polymorphism detection. 

20 

1. Sequencing Methods 

As noted above, compounds including those of the present invention 
may be utilized for a variety of sequencing methods, including both enzymatic and 
chemical degradation methods. Briefly, the enzymatic method described by Sanger 

25 (Proc. Natl. Acad. Sci. (USA) 74:5463, 1977) which utilizes dideoxy-terminators, 
involves the synthesis of a DNA strand from a single-stranded template by a DNA 
polymerase. The Sanger method of sequencing depends on the fact that that 
dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way 
a normal deoxynucleotides (albeit at a lower efficiency). However, ddNTPs differ from 

30 normal deoxynucleotides (dNTPs) in that they lack the 3'-OH group necessary for chain 
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elongation. When a ddNTP is incorporated into the DNA chain, the absence the 3'- 
hydroxy group prevents the formation of a new phosphodiester bond and the DNA 
fragment is terminated with the ddNTP complementary to the base in the template 
DNA. The Maxam and Gilbert method (Maxam and Gilbert, Proc. Natl. Acad Sci. 
(USA) 74:560, 1977) employs a chemical degradation method of the original DNA (in 
both cases the DNA must be clonal). Both methods produce populations of fragments 
that begin from a particular point and terminate in every base that is found in the DNA 
fragment that is to be sequenced. The termination of each fragment is dependent on the 
location of a particular base within the original DNA fragment. The DNA fragments 
are separated by polyacrylamide gel electrophoresis and the order of the DNA bases 
(A,C,T,G) is read from a autoradiograph of the gel. 



2 - Exonuclease DNA Sequencing 

A procedure for determining DNA nucleotide sequences was reported by 
Labeit et al. (S. Labeit, H. Lehrach & R. S. Goody, DNA 5: 173-7, 1986; A new method 
of DNA sequencing using deoxynucleoside alpha-thiotriphosphates). In the first step of 
the method, four DNAs, each separately substituted with a different deoxynucleoside 
phosphorothioate in place of the corresponding monophosphate, are prepared by 
template-directed polymerization catalyzed by DNA polymerase. In the second step, 
20 these DNAs are subjected to stringent exonuclease III treatment, which produces only 
fragments terminating with a phosphorothioate internucleotide linkage. These can then 
be separated by standard gel electrophoresis techniques and the sequence can be read 
directly as in presently used sequencing methods. Porter et al. (K. W. Porter, J. 
Tomasz, F. Huang, A. Sood & B. R. Shaw, Biochemistry 34: 11963-1 1969, 1995; N7- 
25 cyanoborane-2'-deoxyguanosine 5'-triphosphate is a good substrate for DNA 
polymerase) described a new set of boron-substituted nucleotide analogs which are also 
exonuclease resistant and good substrates for a number of polymerases: these base are 
also suitable for exonuclease DNA sequencing. 
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3. A Simplified Strategy for Sequencing Large Numbers of Full Length 
cDNAs. 

cDNA sequencing has been suggested as an alternative to generating the 
complete human genomic sequence. Two approaches have been attempted. The first 
5 involves generation of expressed sequence tags (ESTs) through a single DN A sequence 
pass at one end of each cDNA clone. This method has given insights into the 
distribution of types of expressed sequences and has revealed occasional useful 
homology with genomic fragments, but overall has added little to our knowledge base 
since insufficient data from each clone is provided. The second approach is to generate 
10 complete cDNA sequence which can indicate the possible function of the cDNAs. 
Unfortunately most cDNAs are of a size range of 1-4 kilobases which hinders the 
automation of full-length sequence determination. Currently the most efficient method 
for large scale, high throughput sequence production is from sequencing from a 
vector/primer site, which typically yields less than 500 bases of sequence from each 
15 flank. The synthesis of new oligonucleotide primers of length 15-18 bases for 'primer 
walking" can allow closure of each sequence. An alternative strategy for full length 
cDNA sequencing is to generate modified templates that are suitable for sequencing 
with a universal primer, but provide overlapping coverage of the molecules. 

Shotgun sequencing methods can be applied to cDN A sequencing 
20 studies by preparing a separate library from each cDNA clone. These methods have not 
been used extensively for the analysis of the 1.5 - 4.0 kilobase fragments, however, as 
they are very labor intensive during the initial cloning phase. Instead they have 
generally been applied to projects where the target sequence is of the order of 15 to 40 
kilobases, such as in lambda or cosmid inserts. 

25 

4. Analogy of cDNA with Genomic Sequencing 

Despite the typically different size of the individual clones to be 
analyzed in cDNA sequencing, there are similarities with the requirements for large 
scale genomic DNA sequencing. In addition to a low cost per base, and a high 
30 throughput, the ideal strategy for full length cDNA sequencing will have a high 
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accuracy. The favored current methodology for genomic DNA sequencing involves the 
preparation of shotgun sequencing libraries from cosmids, followed by random 
sequencing using ABI fluorescent DNA sequencing instruments, and closure (finishing) 
by directed efforts. Overall there is agreement that the fluorescent shotgun approach is 
superior to current alternatives in terms of efficiency and accuracy. The initial shotgun 
library quality is a critical determinant of the ease and quality of sequence assembly. 
The high quality of the available shotgun library procedure has prompted a strategy for 
the production of multiplex shotgun libraries containing mixtures of the smaller cDNA 
clones. Here the individual clones to be sequenced are mixed prior to library 
construction and then identified following random sequencing, at the stage of computer 
analysis. Junctions between individual clones are labeled during library production 
either by PCR or by identification of vector arm sequence. 

Clones may be prepared both by microbial methods or by PCR. When 
using PCR, three reactions from each clone are used in order to minimize the risk for. 
1 5 errors. 

One pass sequencing is a new technique designed to speed the 
identification of important sequences within a new region of genomic DNA. Briefly, a 
high quality shotgun library is prepared and then the sequences sampled to obtain 80 - 
95% coverage. For a cosmid this would typically be about 200 samples. Essentially all 
genes are likely to have at least one exon detected in this sample using either sequence 
similarity (BLAST) or exon structure (GRAIL2) screening. 

"Skimming" has been successfully applied to cosmids and Pis. One 
pass sequencing is potentially the fastest and least expensive way to find genes in a 
positional cloning project. The outcome is virtually assured. Most investigators are 
currently developing cosmid contigs for exon trapping and related techniques. Cosmids 
are completely suitable for sequence skimming. PI and other BACs could be 
considerably cheaper since there is savings both in shotgun library construction and 
minimization of overlaps. 
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5. Shotgun Sequencing 

Shotgun DN A sequencing starts with random fragmentation of the target 
DNA. Random sequencing is then used to generate the majority of the data. A directed 
phase then completes gaps, ensuring coverage of each strand in both directions. 
Shotgun sequencing offers the advantage of high accuracy at relatively low cost. The 
procedure is best suited to the analysis of relatively large fragments and is the method 
of choice in large scale genomic DNA sequencing. 

There are several factors that are important in making shotgun 
sequencing accurate and cost effective. A major consideration is the quality of the 
shotgun library that is generated, since any clones that do not have inserts, or have 
chimeric inserts, will result in subsequent inefficient sequencing. Another consideration 
is the careful balancing of the random and the directed phases of the sequencing, so that 
high accuracy is obtained with a minimal loss of efficiency through unnecessary 
sequencing. 



15 



20 



6. Seq uencing Chemistry: Tagged-Ter minator Chemistry 

There are two types of fluorescent sequencing chemistries currently 
available: dye primer, where the primer is fluorescently labeled, and dye terminator, 
where the dideoxy terminators are labeled. Each of these chemistries can be used with 
either Taq DNA polymerase or sequenase enzymes. Sequenase enzyme seems to read 
easily through G-C rich regions, palindromes, simple repeats and other difficult to read 
sequences. Sequenase is also good for sequencing mixed populations. Sequenase 
sequencing requires 5 ]ig of template, one extension and a multi-step cleanup process. 
Tagged-primer sequencing requires four separate reactions, one for each of A. C G and 
25 T and then a laborious cleanup protocol. Taq terminator cycle sequencing chemistry is 
the most robust sequencing method. With this method any sequencing primer can be 
used. The amount of template needed is relatively small and the whole reaction process 
from setup to cleanup is reasonably easy, compared to sequenase and dye primer 
chemistries. Only 1.5 ug of DNA template and 4 pm of primer are needed. To this a 
30 ready reaction mix is added. This mix consists of buffer, enzyme, dNTPs and labeled 
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dideoxynucleotides. This reaction can be done in one tube as each of the four dideoxies 
is labeled with a different fluorescent dye. These labeled terminators are present in this 
mix in excess because they are difficult to incorporate during extension. With unclean 
DNA the incorporation of these high molecular weight dideoxies can be inhibited. The 
5 premix includes dITP to minimize band compression. The use of Taq as the DNA 
polymerase allows the reactions to be run at high temperatures to minimize secondary 
structure problems as well as non-specific primer binding. The whole cocktail goes 
through 25 cycles of denaturation, annealing and extension in a thermal cycler and the 
completed reaction is spun through a Sephadex G50 (Pharmacia, Piscataway, NJ) 
1 0 column and is ready for gel loading after five minutes in a vacuum dessicator. 

7- Designing Primers 

When designing primers, the same criteria should be used as for 
designing PCR primers. In particular, primers should preferably be 1 8 to 20 nucleotides 
15 long and the 3-prime end base should be a G or a C. Primers should also preferably 
have a Tm of more than 50°C. Primers shorter than 18 nucleotides will work but are 
not recommended. The shorter the primer the greater the probability of it binding at 
more than one site on the template DNA, and the lower its Tm. The sequence should 
have 100% match with the template. Any mismatch, especially towards the 3-prime end 
20 will greatly diminish sequencing ability. However primers with 5-prime tails can be 
used as long as there is about 18 bases at 3-prime that bind. If one is designing a primer 
from a sequence chromatogram, an area with high confidence must be used. As one 
moves out past 350 to 400 bases on a standard chromatogram, the peaks get broader and 
the base calls are not as accurate. As described herein, the primer may possess a 5' 
25 handle through which a linker or linker tag may be attached. 

8 - Nucleic Ac id Temnlate Prenaratinn 

The most important factor in tagged-primer DNA sequencing is the 
quality of the template. Briefly, one common misconception is that if a template works 
30 in manual sequencing, it should work in automated sequencing. In fact, if a reaction 
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works in manual sequencing it may work in automated sequencing, however, automated 
sequencing is much more sensitive and a poor quality template may result in little or no 
data when fluorescent sequencing methods are utilized. High salt concentrations and 
other cell material not properly extracted during template preparation, including RNA, 
5 may likewise prevent the ability to obtain accurate sequence information. Many mini 
and maxi prep protocols produce DNA which is good enough for manual sequencing or 
PCR, but not for automated (tagged-primer) sequencing. Also the use of phenol is not 
at all recommended as phenol can intercalate in the helix structure. The use of 100% 
chloroform is sufficient. There are a number of DNA preparation methods which are 
10 particularly preferred for the tagged primer sequencing methods provided herein. In 
particular, maxi preps which utilize cesium chloride preparations or Qiagen 
(Chatsworth, CA) maxi prep, columns (being careful not to overload) are preferred. For 
mini preps, columns such as Promega's Magic Mini prep (Madison, WI), may be 
utilized. When sequencing DNA fragments such as PCR fragments or restriction cut 
1 5 fragments, it is generally preferred to cut the desired fragment from a low melt argarose 
gel and then purify with a product such as GeneClean (La Jolla, CA). It is very 
important to make sure that only one band is cut from the gel. For PCR fragments the 
PCR primers or internal primers can be used in order to ensure that the appropriate 
fragment was sequenced. To get optimum performance from the sequence analysis 
20 software, fragments should be larger than 200 bases. Double stranded or single 
stranded DNA can be sequenced by this method. 

An additional factor generally taken into account when preparing DN A 
for sequencing is the choice of host strain. Companies selling equipment and reagents 
for sequencing, such as ABI (Foster City, CA) and Qiagen (Chatsworth, CA), typically 
25 recommend preferred host strains, and have previously recommended strains such as 
DH5 alpha, HB101, XL-1 Blue, JM109, MV1190. Even when the DNA preparations 
are very clean, there are other inherent factors which can make it difficult to obtain 
sequence. G-C rich templates are always difficult to sequence through, and secondary 
structure can also cause problems. Sequencing through a long repeats often proves to 
30 be difficult. For instance as Taq moves along a poly T stretch, the enzyme often falls 
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off the template and jumps back on again, skipping a T. This results in extension 
products with X amount of Ts in the poly T stretch and fragments with X-l, X-2 etc. 
amounts of Ts in the poly T stretch. The net effect is that more than one base appears in 
each position making the sequence impossible to read. 



9. Use of Mol ecularlv Distinct Cjoning Vectors 

Sequencing may also be accomplished utilizing universal cloning vector 
(M13) and complementary sequencing primers. Briefly, for present cloning vectors the 
same primer sequence is used and only 4 tags are employed (each tag is a different 
10 fluorophore which represents a different terminator (ddNTP)), every amplification 
process must take place in different containers (one DNA sample per container). That 
is, it is impossible to mix two or different DNA samples in the same amplification 
process. With only 4 tags available, only one DNA sample can be run per gel lane. 
There is no convenient means to deconvolute the sequence of more than one DNA 
1 5 sample with only 4 tags. (In this regard, workers in the field take great care not to mix 
or contaminate different DNA samples when using current technologies.) 

A substantial advantage is gained when multiples of 4 tags can be run 
per gel lane or respective separation process. In particular, utilizing tags of the present 
invention, more than one DNA sample in a single amplification reaction or container 
20 can be processed. When multiples of 4 tags are available for use, each tag set can be 
assigned to a particular DNA sample that is to be amplified. (A tag set is composed of a 
series of 4 different tags each with a unique property. Each tag is assigned to represent 
a different dideoxy-terminator, ddATP, ddGTP, ddCTP, or ddTTP. To employ this 
advantage a series of vectors must be generated in which a unique priming site is 
25 inserted. A unique priming site is simply a stretch of 1 8 nucleotides which differs from 
vector to vector. The remaining nucleotide sequence is conserved from vector to vector. 
A sequencing primer is prepared (synthesized) which corresponds to each unique 
vector. Each unique primer is derived (or labelled) with a unique tag set. 

With these respective molecular biology tools in hand, it is possible in 
30 the present invention to process multiple samples in a single container. First, DNA 
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samples which are to be sequenced are cloned into the multiplicity of vectors. For 
example, if 100 unique vectors are available, 100 ligation reactions, plating steps, and 
picking of plaques are performed. Second, one sample from each vector type is pooled 
making a pool of 100 unique vectors containing 100 unique DNA fragments or samples. 
5 A given DNA sample is therefore identified and automatically assigned a primer set 
with the associated tag set. The respective primers, buffers, polymerase(s), ddNTPs, 
dNTPs and co-factors are added to the reaction container and the amplification process 
is carried out. The reaction is then subjected to a separation step and the respective 
sequence is established from the temporal appearance of tags. The ability to pool 
10 multiple DNA samples has substantial advantages. The reagent cost of a typical PCR 
reaction is about $2.00 per sample. With the method described herein the cost of 
amplification on a per sample basis could be reduced at least by a factor of 100. Sample 
handling could be reduced by a factor of at least 100, and materials costs could be 
reduced. The need for large scale amplification robots would be obviated. 

15 

10. Sequencing Vectors for Cleavable Mass Spec troscopy Tagging 

Using cleavable mass spectroscopy tagging (CMST) of the present 
invention, each individual sequencing reaction can be read independently and 
simultaneously as the separation proceeds. In CMST sequencing, a different primer is 

20 used for each cloning vector: each reaction has 20 different primers when 20 clones arc 
used per pool. Each primer corresponds to one of the vectors, and each primer is tagged 
with a unique CMST molecule. Four reactions are performed on each pooled DNA 
sample (one for each base), so every vector has four oligonucleotide primers, each one 
identical in sequence but tagged with a different CMS tag. The four separate 

25 sequencing reactions are pooled and run together. When 20 samples are pooled, 80 tags 
are used (4 bases per sample times 20 samples), and all 80 are detected simultaneously 
as the gel is run. 

The construction of the vectors may be accomplished by cloning a 
random 20-mer on either side of a restriction site. The resulting clones are sequenced 
30 and a number chosen for use as vectors. Two oligonucleotides are prepared for each 
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vector chosen, one homologous to the sequence at each side of the restriction site, and 
each orientated so that the 3'-end is towards the restriction site. Four tagged 
preparations of each primer are prepared, one for each base in the sequencing reactions 
and each one labeled with a unique CMS tag. 

5 

1L Advantages of Seouencinp bv the Use of Reversible 

There are substantial advantages when cleavable tags are used in 
sequencing and related technologies. First, an increase in sensitivity will contribute to 
longer read lengths, as will the ability to collect tags for a specified period of time prior 
1 0 to measurement. The use of cleavable tags permits the development of a system that 
equalizes bandwidth over the entire range of the gel (1-1500 nucleotides (nt), for 
example). This will greatly impact the ability to obtain read lengths greater than 450 nt. 

The use of cleavable multiple tags (MW identifiers) also has the 
advantage that multiple DNA samples can be run on a single gel lane or separation 
15 process. For example, it is possible using the methodologies disclosed herein to 
combine at least 96 samples and 4 sequencing reactions (A,G,T,C) on a single lane or 
fragment sizing process. If multiple vectors are employed which possess unique 
priming sites, then at least 384 samples can be combined per gel lane (the different 
terminator reactions cannot be amplified together with this scheme). When the ability 
20 to employ cleavable tags is combined with the ability to use multiple vectors, an 
apparent 10,000-fold increase in DNA sequencing thoughput is achieved. Also, in the 
schemes described herein, reagent use is decreased, disposables decrease, with a 
resultant decrease in operating costs to the consumer. 

An additional advantage is gained from the ability to process internal 
25 controls throughout the entire methodologies described here. For any set of samples, an 
internal control nucleic acid can be placed in the sample(s). This is not possible with 
the current configurations. This advantage permits the control of the amplification 
process, the separation process, the tag detection system and sequence assembly. This 
is an immense advantage over current systems in which the controls are always 
30 separated from the samples in all steps. 
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The compositions and methods described herein also have the advantage 
that they are modular in nature and can be fitted on any type of separation process or 
method and in addition, can be fitted onto any type of detection system as 
improvements are made in either types of respective technologies. For example, the 
5 methodologies described herein can be coupled with "bundled" CE arrays or 
micro fabricated devices that enable separation of DNA fragments. 

C. SEPARATION OF DNA FRAGMENTS 

A sample that requires analysis is often a mixture of many components 

10 in a complex matrix. For samples containing unknown compounds, the components 
must be separated from each other so that each individual component can be identified 
by other analytical methods. The separation properties of the components in a mixture 
are constant under constant conditions, and therefore once determined they can be used 
to identify and quantify each of the components. Such procedures are typical in 

1 5 chromatographic and electrophoretic analytical separations. 

1. High-Performance Liquid Chromatograph y (HPLO 

High-Performance liquid chromatography (HPLC) is a chromatographic 
separations technique to separate compounds that are dissolved in solution. HPLC 

20 instruments consist of a reservoir of mobile phase, a pump, an injector, a separation 
column, and a detector. Compounds are separated by injecting an aliquot of the sample 
mixture onto the column. The different components in the mixture pass through the 
column at different rates due to differences in their partitioning behavior between the 
mobile liquid phase and the stationary phase. 

25 Recently, IP-RO-HPLC on non-porous PS/DVB particles with 

chemically bonded alkyl chains have been shown to be rapid alternatives to capillary 
electrophoresis in the analysis of both single and double-strand nucleic acids providing 
similair degrees of resolution (Huber et al, 1993, Anal.Biochem., 212, p351; Huber et 
al., 1993, Nuc. Acids Res., 21, pi061; Huber et al., 1993, Biotechniques, 16, p898). In 

30 contrast to ion-excahnge chromoatrography, which does not always retain double-strand 
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DNA as a function of strand length (Since AT base pairs intereact with the positively 
charged stationary phase, more strongly than GC base-pairs), IP-RP-HPLC enables a 
strictly size-dependent separation. 

A method has been developed using 100 mM triethylammonium acetate 
as ion-pairing reagent, phosphodiester oligonucleotides could be successfully separated 
on alkylated non-porous 2.3 uM poly(styrene-divinylbenzene) particles by means of 
high performance liquid chromatography (Oefner et a!., 1994, Anal. Biochem., 223, 
p39). The technique described allowed the separation of PCR products differing only 4 
to 8 base pairs in length within a size range of 50 to 200 nucleotides. 



2. Electrophoresis 

Electrophoresis is a separations technique that is based on the mobility of 
ions (or DNA as is the case described herein) in an electric field. Negatively charged 
DNA charged migrate towards a positive electrode and positively-charged ions migrate 

15 toward a negative electrode. For safety reasons one electrode is usually at ground and 
the other is biased positively or negatively. Charged species have different migration 
rates depending on their total charge, size, and shape, and can therefore be separated. 
An electrode apparatus consists of a high-voltage power supply, electrodes, buffer, and 
a support for the buffer such as a polyacrylamide gel, or a capillary tube. Open capillary 

20 tubes are used for many types of samples and the other gel supports are usually used for 
biological samples such as protein mixtures or DNA fragments. 

3- Capillary E lectrophoresis fPF) 

Capillary electrophoresis (CE) in its various manifestations (free 
25 solution, isotachophoresis, isoelectric focusing, polyacrylamide gel, micellar 
electrokinetic "chromatography") is developing as a method for rapid high resolution 
separations of very small sample volumes of complex mixtures. In combination with the 
inherent sensitivity and selectivity of MS, CE-MS is a potential powerful technique for 
bioanalysis. In the novel application disclosed herein, the interfacing of these two 
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methods will lead to superior DNA sequencing methods that eclipse the current rate 
methods of sequencing by several orders of magnitude. 

The correspondence between CE and electrospray ionization (ESI) flow 
rates and the fact that both are facilitated by (and primarily used for) ionic species in 
5 solution provide the basis for an extremely attractive combination. The combination of 
both capillary zone electrophoresis (CZE) and capillary isotachophoresis with 
quadrapole mass spectrometers based upon ESI have been described (Olivares et al.. 
Anal. Chem. 59:1230, 1987; Smith et al., Anal. Chem. 6"0:436, 1988; Loo et al.. Anal. 
Chem. 779:404, 1989; Edmonds et al., J. Chroma. 474:21, 1989; Loo et al., 
10 J. Microcolumn Sep. 7:223, 1989; Lee et al., J. Chromatog. 458:3U, 1988; Smith et al., 
J. Chromatog. 450:211, 1989; Grese et al., J. Am. Chem. Soc. 777:2835, 1989). Small 
peptides are easily amenable to CZE analysis with good (femtomole) sensitivity. 

The most powerful separation method for DNA fragments is 
polyacrylamide gel electrophoresis (PAGE), generally in a slab gel format. However, 
15 the major limitation of the current technology is the relatively long time required to 
perform the gel electrophoresis of DNA fragments produced in the sequencing 
reactions. An increase magnitude (10-fold) can be achieved with the use of capillary 
electrophoresis which utilize ultrathin gels. In free solution to a first approximation all 
DNA migrate with the same mobility as the addition of a base results in the 
20 compensation of mass and charge. In polyacrylamide gels, DNA fragments sieve and 
migrate as a function of length and this approach has now been applied to CE. 
Remarkable plate number per meter has now been achieved with cross-linked 
polyacrylamide (10 +? plates per meter, Cohen et al., Proc. Natl. Acad. ScL USA 
85:9660, 1988). Such CE columns as described can be employed for DNA sequencing. 
25 The method of CE is in principle 25 times faster than slab gel electrophoresis in a 
standard sequencer. For example, about 300 bases can be read per hour. The separation 
speed is limited in slab gel electrophoresis by the magnitude of the electric field which 
can be applied to the gel without excessive heat production. Therefore, the greater speed 
of CE is achieved through the use of higher field strengths (300 V/cm in CE versus 10 
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V/cm in slab gel electrophoresis). The capillary format reduces the amperage and thus 
power and the resultant heat generation. 

Smith and others (Smith et al.. Nuc. Acids. Res. /<ff:4417, 1990) have 
suggested employing multiple capillaries in parallel to increase throughput. Likewise. 
5 Mathies and Huang (Mathies and Huang, Nature 559:167, 1992) have introduced 
capillary electrophoresis in which separations are performed on a parallel array of 
capillaries and demonstrated high through-put sequencing (Huang et al., Anal. Chem. 
64:967, 1992, Huang et al.. Anal. Chem. 6*2149, 1992). The major disadvantage of 
capillary electrophoresis is the limited amount of sample that can be loaded onto the 
1 0 capillary. By concentrating a large amount of sample at the beginning of the capillary, 
prior to separation, loadability is increased, and detection levels can be lowered several 
orders of magnitude. The most popular method of preconcentration in CE is sample 
stacking. Sample stacking has recently been reviewed (Chien and Burgi, Anal. Chem. 
64AS9A, 1992). Sample stacking depends of the matrix difference, (pH, ionic strength) 
15 between the sample buffer and the capillary buffer, so that the electric field across the 
sample zone is more than in the capillary region. In sample stacking, a large volume of 
sample in a low concentration buffer is introduced for preconcentration at the head of 
the capillary column. The capillary is filled with a buffer of the same composition, but 
at higher concentration. When the sample ions reach the capillary buffer and the lower 
20 electric field, they stack into a concentrated zone. Sample stacking has increased 
detectabilities 1-3 orders of magnitude. 

Another method of preconcentration is to apply isotachophoresis (ITP) 
prior to the free zone CE separation of analytes. ITP is an electrophoretic technique 
which allows microliter volumes of sample to be loaded on to the capillary, in contrast 
25 to the low nL injection volumes typically associated with CE. The technique relies on 
inserting the sample between two buffers (leading and trailing electrolytes) of higher 
and lower mobility respectively, than the analyte. The technique is inherently a 
concentration technique, where the analytes concentrate into pure zones migrating with 
the same speed. The technique is currently less popular than the stacking methods 
30 described above because of the need for several choices of leading and trailing 
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electrolytes, and the ability to separate only cationic or anionic species during a 
separation process. 

The heart of the DNA sequencing process is the remarkably selective 
electrophoretic separation of DNA or oligonucleotide fragments. It is remarkable 
5 because each fragment is resolved and differs by only nucleotide. Separations of up to 
1000 fragments (1000 bp) have been obtained. A further advantage of sequencing with 
cleavable tags is as follows. There is no requirement to use a slab gel format when 
DNA fragments are separated by polyacrylamide gel electrophoresis when cleavable 
tags are employed. Since numerous samples are combined (4 to 2000) there is no need 
10 to run samples in parallel as is the case with current dye-primer or dye-terminator 
methods (i.e., ABI373 sequencer). Since there is no reason to run parallel lanes, there is 
no reason to use a slab gel. Therefore, one can employ a tube gel format for the 
electrophoretic separation method. Grossman (Grossman et al., Genet. Anal. Tech. Appl. 
9:9, 1992) have shown that considerable advantage is gained when a tube gel format is 
15 used in place of a slab gel format. This is due to the greater ability to dissipate Joule 
heat in a tube format compared to a slab gel which results in faster run times (by 50%), 
and much higher resolution of high molecular weight DNA fragments (greater than 
1000 nt). Long reads are critical in genomic sequencing. Therefore, the use of cleavable 
tags in sequencing has the additional advantage of allowing the user to employ the most 
20 efficient and sensitive DNA separation method which also possesses the highest 
resolution. 



4. Microfabricated Devices 

Capillary electrophoresis (CE) is a powerful method for DNA 
25 sequencing, forensic analysis, PCR product analysis and restriction fragment sizing. CE 
is far faster than traditional slab PAGE since with capillary gels a far higher potential 
field can be applied. However, CE has the drawback of allowing only one sample to be 
processed per gel. The method combines the faster separations times of CE with the 
ability to analyze multiple samples in parallel. The underlying concept behind the use 
30 of microfabricated devices is the ability to increase the information density in 



3NSOOCID: <WO 9727331 A2J_> 



WO 97/27331 



85 



PCT/US97/01304 



electrophoresis by miniaturizing the lane dimension to about 100 micrometers. The 
electronics industry routinely uses microfabrication to make circuits with features of 
less than one micron in size. The current density of capillary arrays is limited the 
outside diameter of the capillary tube. Microfabrication of channels produces a higher 
5 density of arrays. Microfabrication also permits physical assemblies not possible with 
glass fibers and links the channels directly to other devices on a chip. Few devices have 
been constructed on microchips for separation technologies. A gas chromatograph 
(Terry et al., IEEE Trans. Electron Device, ED-26ASS0, 1979) and a liquid 
chromatograph (Manz et al.. Sens. Actuators 57:249, 1990) have been fabricated on 

10 silicon chips, but these devices have not been widely used. Several groups have 
reported separating fluorescent dyes and amino acids on microfabricated devices (Manz 
et al., J. Chromatography 593.253, 1992, Effenhauser et si., Anal. Chem. 65:2637, 
1993). Recently Woolley and Mathies (Woolley and Mathies, Proc. Natl. Acad. Sci. 
9 J A 1348, 1994) have shown that photolithography and chemical etching can be used to 

15 make large numbers of separation channels on glass substrates. The channels are filled 
with hydroxyethyl cellulose (HEC) separation matrices. It was shown that DNA 
restriction fragments could be separated in as little as two minutes. 



20 



D. CLEAVAGE OF TAGS 

As described above, different linker designs will confer cleavability 
("lability") under different specific physical or chemical conditions. Examples of 
conditions which serve to cleave various designs of linker include acid, base, oxidation, 
reduction, fluoride, thiol exchange, photolysis, and enzymatic conditions. 

Examples of cleavable linkers that satisfy the general criteria for linkers 
25 listed above will be well known to those in the art and include those found in the 
catalog available from Pierce (Rockford, IL). Examples include: 

• ethylene glycobis(succinimidylsuccinate) (EGS), an amine reactive 
cross-linking reagent which is cleavable by hydroxylamine (1 M at 37°C 
for 3-6 hours); 
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• disuccinimidyl tartarate (DST) and sulfo-DST, which are amine reactive 
cross-linking reagents, cleavable by 0.015 M sodium periodate; 

• bis[2-(succinimidyloxycarbonyloxy)ethyl]sulfone (BSOCOES) and 
sulfo-BSOCOES, which are amine reactive cross-linking reagents, 

5 cleavable by base (pH 1 1 .6); 

• 1 ,4-di-[3-(2 -pyridyldithio(propionamido))butane (DPDPB), a 
pyridyldithiol crosslinker which is cleavable by thiol exchange or 
reduction; 

• N-[4-(p-azidosalicylamido)-butyl]-3 , -(2 , -pyridydithio)propionamide 

10 (APDP), a pyridyldithiol crosslinker which is cleavable by thiol 

exchange or reduction; 

• bis-[beta-4-(azidosalicylamido)ethyl]-disulfide, a photoreactive 
crosslinker which is cleavable by thiol exchange or reduction; 

• N-succinimidyl-(4-azidophenyl)-l ,3'dithiopropionate (SADP), a 
15 photoreactive crosslinker which is cleavable by thiol exchange or 

reduction; 

• sulfosuccinimidyl-2-(7-azido-4-methylcoumarin-3-acetamide)ethyl- 1 ,3'- 
dithiopropionate (SAED), a photoreactive crosslinker which is cleavable 
by thiol exchange or reduction; 

20 • sulfosuccinimidyl-2-(m-azido-o-nitrobenzamido)-ethyl- 

1,3'dithiopropionate (SAND), a photoreactive crosslinker which is 
cleavable by thiol exchange or reduction. 

Other examples of cleavable linkers and the cleavage conditions that can 
be used to release tags are as follows. A silyl linking group can be cleaved by fluoride 

25 or under acidic conditions. A 3-, 4-, 5-, or 6-substituted-2-nitrobenzyloxy or 2-, 3-, 5-, 
or 6-substituted-4-nitrobenzyloxy linking group can be cleaved by a photon source 
(photolysis). A 3-, 4-, 5-, or 6-substituted-2-alkoxyphenoxy or 2-, 3-, 5-, or 6- 
substituted-4-alkoxyphenoxy linking group can be cleaved by Ce(NH 4 ) 2 (N0 3 ) 6 
(oxidation). A NC0 2 (urethane) linker can be cleaved by hydroxide (base), acid, or 

30 LiAlH 4 (reduction). A 3-pentenyl, 2-butenyl, or 1-butenyl linking group can be cleaved 
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by 0 3 , O s 0 4 /I0 4 , or KMn0 4 (oxidation). A 2-[3-, 4-, or 5-substituted-furyl]oxy linking 
group can be cleaved by O,, Br 2 , MeOH, or acid. 

Conditions for the cleavage of other labile linking groups include: 
t-alkyloxy linking groups can be cleaved by acid; methyl(dialkyl)methoxy or 4- 
substituted-2-alkyl-l,3-dioxlane-2-yl linking groups can be cleaved by H,CT; 
2-silylethoxy linking groups can be cleaved by fluoride or acid; 2-(X)-ethoxy (where 
X = keto, ester amide, cyano, N0 2 , sulfide, sulfoxide, sulfone) linking groups can be 
cleaved under alkaline conditions; 2-, 3-, 4-, 5-, or 6-substituted-benzyloxy linking 
groups can be cleaved by acid or under reductive conditions; 2-butenyloxy linking 
groups can be cleaved by (Ph 3 P) 3 RhCl(H), 3-, 4-, 5-, or 6-substituted-2-bromo P henoxy 
linking groups can be cleaved by Li, Mg, or BuLi; methylthiomethoxy linking groups 
can be cleaved by Hg 2+ ; 2-(X)-ethyloxy (where X = a halogen) linking groups can be 
cleaved by Zn or Mg; 2-hydroxyethyloxy linking groups can be cleaved by oxidation 
(e.g., with Pb(OAc) 4 ). 

5 Preferred linkers are those that are cleaved by acid or photolysis. Several 

of the acid-labile linkers that have been developed for solid phase peptide synthesis are 
useful for linking tags to MOIs. Some of these linkers are described in a recent review 
by Lloyd-Williams etal. (Tetrahedron 49: 1 1065-1 1 133, 1993). One useful type of 
linker is based upon p-alkoxybenzyl alcohols, of which two. 4- 
0 hydroxymethylphenoxyacetic acid and 4-(4-hydroxymethyl-3-methoxyphenoxy)butyric 
acid, are commercially available from Advanced ChemTech (Louisville, KY). Both 
linkers can be attached to a tag via an ester linkage to the benzylalcohol, and to an 
amine-containing MOI via an amide linkage to the carboxylic acid. Tags linked by 
these molecules are released from the MOI with varying concentrations of 
trifluoroacetic acid. The cleavage of these linkers results in the liberation of a 
carboxylic acid on the tag. Acid cleavage of tags attached through related linkers, such 
as 2,4-dimethoxy-4 t -(carboxymethyloxy)-benzhydrylamine (available from Advanced 
ChemTech in FMOC-protected form), results in liberation of a carboxylic amide on the 
released tag. 
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The photolabile linkers useful for this application have also been for the 
most part developed for solid phase peptide synthesis (see Lloyd-Williams review). 
These linkers are usually based on 2-nitrobenzylesters or 2-nitrobenzylamides. Two 
examples of photolabile linkers that have recently been reported in the literature are 4- 
5 (4-(l .Fmoc-amino)ethyl)-2-methoxy-5-nitrophenoxy)butanoic acid (Holmes and Jones, 
j Qrg . chem. 60:2318-2319, 1995) and 3-(Fmoc-amino)-3-(2-nitrophen y l)propionic 
acid (Brown et al., Molecular Diversity /:4-12, 1995). Both linkers can be attached via 
the carboxylic acid to an amine on the MOl. The attachment of the tag to the linker 1S 
made by forming an amide between a carboxylic acid on the tag and the amine on the 
10 linker. Cleavage of photolabile linkers is usually performed with UV light of 350 nm 
wavelength at intensities and times known to those in the art. Examples of commercial 
sources of instruments for photochemical cleavage are Aura Industries Inc. (Staten 
island, NY) and Agrenetics (Wilmington, MA). Cleavage of the linkers results m 
liberation of a primary amide on the tag. Examples of photocleavable linkers include 
15 nitrophenyl glycine esters, exo- and endo-2-benzo„orborneyl chlorides and methane 
sulfonates, and 3-amino-3(2-nitrophenyl) propionic acid. Examples of enzymaUc 
cleavage include esterases which will cleave ester bonds, nucleases which will cleave 
phosphodiester bonds, proteases which cleave peptide bonds, etc. 

20 E DEJTECTION OF TAGS 

Detection methods typically rely on the absorption and emission in some 
type of spectral field. When atoms or molecules absorb light, the incoming energy 
excites a quantized structure to a higher energy level. The type of excitation depends on 
the wavelength of the light. Electrons are promoted to higher orbitals by ultraviolet or 
25 visible light, molecular vibrations are excited by infrared light, and rotations are exerted 
by microwaves. An absorption spectrum is the absorption of light as a function o 
wavelength. The spectrum of an atom or molecule depends on its energy level 
structure. Absorption spectra are useful for identification of compounds. Specnlc 
absorption spectroscopic methods include atomic absorption spectroscopy (AA), 
30 infrared spectroscopy (IR), and UV-vis spectroscopy (uv-vis). 
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container contains a vector, a second, third, fourth and fifth containers contain 
compounds of the formula: 

T^-L-MOI 

wherein. 

T™ is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; and 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; such that 

the MOI for the second, third, fourth and fifth containers is identical and 
complementary to a portion of the vector within the set of containers, and the T™ group 
within each container is different from the other P"* groups in the kit. 

49. A kit according to claim 48 wherein the plurality is at least 3. 

50. A kit according to claim 48 wherein the plurality is at least 5. 

51. A system for determining the sequence of a nucleic acid molecule, 
comprising a separation apparatus that separates tagged nucleic acid fragments, an 
apparatus that cleaves from a tagged nucleic acid fragment a tag which is correlative with 
a particular nucleotide and detectable by electrochemical detection, and an apparatus for 
potentiostatic amperometry. 
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wherein between sets, the I™ groups differ by at least 2 amu. 

45. A composition according to claim 44 wherein the plurality is at 

least 3. 

46. A composition according to claim 44 wherein the plurality is at 

least 5. 

47. A composition comprising a first plurality of sets of compounds 
according to claim 44, and a second plurality of sets of compounds having the formula 

T^-L-MOI 

wherein, 

is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a T ms -contaimng moiety to be cleaved 
from the remainder of the compound, wherein the ^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; and 

wherein all members within the second plurality have an MOI sequence 
which terminates with the same dideoxynucleotide selected from ddAMP, ddGMP, 
ddCMP and ddTMP; with the proviso that the dideoxynucleotide present in the 
compounds of the first plurality is not the same dideoxynucleotide present in the 
compounds of the second plurality. 

48. A kit for DNA sequencing analysis comprising a plurality of 
container sets, each container set comprising at least five containers, wherein a first 
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subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; and 

MOl is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; and 



41. A composition according to claim 40 further comprising buffer, 
having a pH of about 5 to about 9. 



42. A composition according to claim 40 further comprising an 
enzyme and one ofdATP, dGTP, dCTP, and dTTP. 

43. A composition according to claim 40 further comprising an 
enzyme and one of ddATP, ddGTP, ddCTP, and ddTTP. 

44. A composition comprising a plurality of sets of compounds, each 
set of compounds having the formula: 

T^-L-MOI 

wherein, 

1™ is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a l™^ntaining moiety to be cleaved 
from the remainder of the compound, wherein the ^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3 ' end of the MOI; 

wherein within a set, all members have the same V s group, and the MOI 
fragments have variable lengths that terminate with the same dideoxynucleotide selected 
from ddAMP, ddGMP, ddCMP and ddTMP; and 
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subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; and 

wherein no two compounds have either the same T™ or the same MOI. 

35. A composition according to claim 34 wherein the plurality is 

greater than 2. 

36. A composition according to claim 34 wherein the plurality is 

greater than 4. 

37. A composition according to claim 34 wherein the nucleic acid 
fragment has a sequence complementary to a portion of a vector, wherein the fragment is 
capable of priming nucleotide synthesis. 

38. A composition according to claim 34 wherein the T" $ groups of 
members of the plurality differ by at least 2 amu. 

39. A composition according to claim 34 wherein the T™ groups of 
members of the plurality differ by at least 4 amu. 

40. A composition comprising water and a compound of the formula: 

T^-L-MOI 

wherein, 

T" s is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a ^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
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acid, N-Fonnyl-DL-Trp-OH, (R).(+).a-Methoxy-a-(trifluoromethyl)phenylacetic acid. 
Bz-DL-Leu-OH, 4-(Trifluoromethoxy)phenoxyacetic acid, 4-Heptyloxybenzoic acid, 
2,3,4-Trimethoxycinnamic acid, 2,6-Dimethoxybenzoyl-Gly-OH, 3-(3,4,5- 
TrimethoxyphenyOpropionic acid, 2,3,4,5,6-Pentafluorophenoxyacetic acid, N-(2,4- 
Difluorophenyl)glutaramic acid, N-Undecanoyl-Gly-OH, 2-(4-FluorobenzoyI)benzoic 
acid, 5-Trifluoromethoxyindole-2-carboxylic acid, N-(2,4-Difluorophenyl)diglycolamic 
acid, Ac-L-Trp^OH, Tfa-L-Phenylglycine-OH, 3-Iodobenzoic acid, 3-(4-n- 
Pentylbenzoyl)propionic acid, 2-Phenyl-4-quinolinecarboxylic acid, 4-Octyloxybenzoic 
acid, Bz-L-Met-OH, 3,4,5-Triethoxybenzoic acid, N-Lauroyl-Gly-OH, 3,5- 
Bis(trifluoromethyl)benzoic acid, Ac-5-Methyl-DL-Trp-OH, 2-Iodophenylacetic acid, 3- 
Iodo-4-methylbenzoic acid, 3-(4-n-Hexylbenzoyl)propionic acid, N-Hexanoyl-L-Phe-OH, 
4-Nonyloxybenzoic acid, 4 , -(Trifluoromethyl)-2-biphenylcarboxylic acid, Bz-L-Phe-OH, 
N-Tridecanoyl-Gly-OH, 3,5-Bis(trifluoromethyl)phenylacetic acid, 3-(4-n- 
Heptylbenzoyl)propionic acid, N-Hepytanoyl-L-Phe-OH, 4-Decyloxybenzoic acid, N- 
(a,a,a-trifluoro-m-tolyl)anthranilic acid, Niflumic acid, 4-(2- 

Hydroxyhexafluoroisopropyl)benzoic acid, N-Myristoyl-Gly-OH, 3-(4-n- 
Octylbenzoyl)propionic acid, N-Octanoyl-L-Phe-OH, 4-Undecyloxybenzoic acid, 3- 
(3,4,5-Trimethoxyphenyl)propionyl-Gly-OH, 8-Iodonaphthoic acid, N-Pentadecanoyl- 
Gly-OH, 4-Dodecyloxybenzoic acid, N-Paimitoyl-Gly-OH, and N-Stearoyl-Gly-OH. , 

34. A composition comprising a plurality of compounds of the 

formula: 

T^-L-MOI 

wherein, 

T ms is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a ^-containing moiety to be cleaved 
from the remainder of the compound, wherein the ^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
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acid, p-Anisic acid, l,2,5-Trimethylpyrrole-3-carboxylic acid, 3-Fluoro-4-methylbenzoic 
acid, Ac-DL-Propargylglycine, 3-(Trifluoromethyl)butyric acid, 1 -Piperidinepropionic 
acid, N-Acetylproline, 3,5-Difluorobenzoic acid, Ac-L-Val-OH, Indole-2-carboxylic acid, 

2- Benzofiirancarboxylic acid, Benzotriazole-5-carboxylic acid, 4-n-Propylbenzoic acid. 

3- Dimethylaminobenzoic acid, 4-Ethoxybenzoic acid, 4-(Methylthio)benzoic acid, N-(2- 
Furoyl)glycine, 2-(Methylthio)nicotinic acid, 3-Fluoro-4-methoxybenzoic acid, Tfa-Gly- 
OH, 2-Napthoic acid, Quinaldic acid, Ac-L-IIe-OH, 3-Methylindene-2-carboxylic acid, 2- 
Quinoxalinecarboxylic acid, 1 -Methylindole-2-carboxylic acid, 2,3,6-Trifluorobenzoic 
acid, N-Formyl-L-Met-OH, 2-[2-(2-Methoxyethoxy)ethoxy)acetic acid, 4-n-Butylbenzoic 
acid, N-Benzoylglycine, 5-Fluoroindole-2-carboxylic acid, 4-n-Propoxybenzoic acid, 4- 
AcetyI-3,5-dimethyl-2-pyrTolecarboxylic acid, 3,5-Dimethoxybenzoic acid, 2,6- 
Dimethoxynicotinic acid, Cyclohexanepentanoic acid, 2-Naphthylacetic acid. 4-(lH- 
PyrroM-yl)benzoic acid, lndole-3-propionic acid, m-Trifluoromethylbenzoic acid, 5- 
Methoxyindole-2-carboxylic acid, 4-Pentylbenzoic acid, Bz-b-Ala-OH, 4- 
Diethylaminobenzoic acid, 4-n-Butoxybenzoic acid, 3-Methyl-5-CF3-isoxazole-4- 
carboxylic acid, (3,4-Dimethoxyphenyl)acetic acid, 4-Biphenylcarboxylic acid, Pivaloyl- 
Pro-OH, Octanoyl-Gly-OH, (2-Naphthoxy)acetic acid, Indole-3 -butyric acid, 4- 
(Trifluoromethyl)phenylacetic acid, 5-Methoxyindole-3-acetic acid, 4- 
(Trifluoromethoxy)benzoic acid, Ac-L-Phe-OH, 4-Pentyloxybenzoic acid, Z-Gly-OH, 4- 
Carboxy-N-(fur-2-ylmethyl)pyrroIidin-2-one, 3,4-Diethoxybenzoic acid, 2,4-Dimethyl-5- 
C0 2 Et-pyrrole-3-carboxylic acid, N-(2-Fluor6phenyl)succinamic acid, 3,4,5- 
Trimethoxybenzoic acid, N-Phenylanthranilic acid, 3-Phenoxybenzoic acid, Nonanoyl- 
Gly-OH, 2-Phenoxypyridine-3-carboxylic acid, 2,5-Dimethyl-l-phenylpyrrole-3- 
carboxylic acid, trans-4-(Trifluoromethyl)cinnamic acid, (5-Methyl-2-phenyloxazol-4- 
yl)acetic acid, 4-(2-Cyclohexenyloxy)benzoic acid, 5-Methoxy-2-methylindole-3-acetic 
acid, trans-4-Cotininecarboxylic acid, Bz-5-Aminovaleric acid, 4-Hexyloxybenzoic acid, 
N-(3-Methoxyphenyl)succinamic acid, Z-Sar-OH, 4-(3,4-DimethoxyphenyI)butyric acid, 
Ac-o-Fluoro-DL-Phe-OH, N-(4-Fluorophenyl)glutaramic acid, 4'-Ethyl-4- 
biphenylcarboxylic acid, 1,2,3,4-Tetrahydroacridinecarboxylic acid, 3- 
Phenoxyphenylacetic acid, N-(2,4-Difluorophenyl)succinamic acid, N-Decanoyl-Gly- 
OH, (+)-6-Methoxy-a-methyl-2-naphthaleneacetic acid, 3-(Trifluoromethoxy)cinnamic 
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f~ \ . _CNH— (C— Cm) / \j 



-CNH-(C,-C I0 )— ^ y ; — CNH— (Cr-C l0 ; 

o \=/ o 
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-CNH— (C,— C 10 > 
O 
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-CNH— (C 2 — C 10 )-N ^> ; — CNH— (C,— C, 0 ) ' N ' 

o — o 




— CNH-(C 2 — C.oh-NCCr-C.o^ ; — CNH-(C 2 — C I0 )— N 



o 




N(C t — C 10 ) ; and 




33. A compound according to any one of claims 25-29 wherein T 2 has 
the structure which results when one of the following organic acids is condensed with an 
amine group to form T 2 -C(=0)-N(R ! )-: Formic acid, Acetic acid, Propiolic acid, 
Propionic acid, Fluoroacetic acid, 2-Butynoic acid, Cyclopropanecarboxylic acid, Butyric 
acid, Methoxyacetic acid, Difluoroacetic acid, 4-Pentynoic acid, Cyclobutanecarboxylic 
aci& 3,3-Dimethylacrylic acid, Valeric acid, N,N-Dimethylglycine, N-Formyl-Gly-OH, 
Ethoxyacetic acid, (Methylthio)acetic acid, Pyrrole-2-carboxylic acid, 3-Furoic acid, 
IsoxazoIe-5-carboxylic acid, trans-3-Hexenoic acid, Trifluoroacetic acid, Hexanoic acid, 
Ac-Gly-OH, 2-Hydroxy-2-methylbutyric acid, Benzoic acid, Nicotinic acid, 2- 
Pyrazinecarboxylic acid, 1 -Methyl -2-pyrrolecarboxylic acid, 2-Cyclopentene-l -acetic 
acid, Cyclopentyiacetic acid, (S)-(-)-2-Pyrrolidone-5-carboxyIic acid, N-Methyl-L- 
proline, Heptanoic acid, Ac-b~Ala-OH, 2-Ethyl-2-hydroxybutyric acid, 2-(2- 
Methoxyethoxy)acetic acid, p-Toluic acid, 6-Methylnicotinic acid, 5-Methyl-2- 
pyrazinecarboxylic acid, 2,5-Dimethylpyrrole-3-carboxylic acid, 4-Fluorobenzoic acid, 
3,5-Dimethylisoxazole-4-carboxylic acid, 3-Cyclopentylpropionic acid, Octanoic acid, 
N,N-Dimethylsuccinamic acid, Phenylpropiolic acid, Cinnamic acid, 4-Ethylbenzoic 
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T 4 
I 

Amide 



9 (( f H 2>c R l 




R" O 



wherein T 5 is an organic moiety of the formula C,. 25 N o . 9 O o .,S 0 . J P 0 . 3 H a F p I 5 
wherein the sum of a, P and 8 is sufficient to satisfy the otherwise unsatisfied valencies 
of the C, N, O, S and P atoms; and T s includes a tertiary or quaternary amine or an 
organic acid; and m is an integer ranging from 0-49. 

31. A compound according to any one of claims 29 and 30 
wherein -Amide-T 5 is selected from: 



0-(C 2 -C, 0 )-N(C,-C,o)2 




-NHC — } ; —NHC 

O V O 
(C,-C 10 ) 




-^NHC-(C,-C 10 )-l/ ^> ; — NHC-(C 0 -C, 0 )-|^) . 



— NHC— \ N-(C,— C 10 ); and —NHC— (C,— C 10 )-N'' 




32. A compound according to any of claims 29 and 30 wherein 
-Amide-T 5 is selected from: 
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T 2 and T 4 are organic moieties of the formula C,_ 25 N 0 . 9 O 0 . 9 S 0 _ ; ,P 0 .jH a FpI 8 
wherein the sum of a, P and 5 is sufficient to satisfy the otherwise unsatisfied valencies 
of the C, K O, S and P atoms; 

O O 
II II 
Amide is — N-C — or — C-N — ; 

R 1 is hydrogen or C M0 alkyl; 

c is an integer ranging from 0 to 4; 

X is defined according to claim 1 ; and 

n is an integer ranging from 1 to 50 such that when n is greater than 1 , G, 
c, Amide, R 1 and T 4 are independently selected. 

29. A compound according to claim 28 having the formula: 

T 4 
I 

Amide 



O (CH 2 ) C R l 0 

1 O (CH 2 ) C 
Amide 



l- x 



wherein T 5 is an organic moiety of the formula C,. 2 5N 0 . 9 O 0 . 9 S 0 .3P 0 . 3 H a FpI 6 
wherein the sum of a, p and 8 is sufficient to satisfy the otherwise unsatisfied valencies 
of the C, N, O, S and P atoms; and T 5 includes a tertiary or quaternary amine or an 
organic acid; and m is an integer ranging from 0-49. 

30. A compound according to claim 28 having the formula: 
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alkyl, morpholino-alkyl, thiomorpholino-alkyl, morpholino carbonyl-substituted alkyl, 
thiomorpholinocarbonyl-substituted alkyl, [N-(alkyU alkenyl or alkynyl)- or N,N-[dialkyl, 
dialkenyl, dialkynyl or (alkyl, alkenyl)-amino]carbonyl-substituted alkyl, 
heterocyclylaminocarbonyl, heterocylylalkyleneaminocarbonyl, 
heterocyclylaminocarbonyl-substituted alkyl, heterocylylalkyleneaminocarbonyl- 
substituted alkyl, N,N-[dialkyl]alkyleneaminocarbonyI, N,N- 

[dialkyI]alkyleneaminocarbonyl-substituted alkyl, alkyl-substituted heterocyclylcarbonyl, 
alkyl-substituted heterocyclylcarbonyl-alkyl, carboxyl-substituted alkyl, dialkylamino- 
substituted acylaminoalkyl and amino acid side chains selected from arginine, asparagine, 
glutamine, S-methyl cysteine, methionine and corresponding sulfoxide and sulfone 
derivatives thereof, glycine, leucine, isoleucinc, allo-isoleucine, tert-leucine, norleucine, 
phenylalanine, tyrosine, tryptophan, proline, alanine, ornithine, histidine, glutamine, 
valine, threonine, serine, aspartic acid, beta-cyanoalanine, and allothreonine; alynyl and 
heterocyclylcarbonyl, aminocarbonyl, amido, mono- or dialkylaminocarbonyl, mono- or 
diarylaminocarbonyl, alkylarylaminocarbonyh diarylaminocarbonyl, mono- or 
diacylaminocarbonyl, aromatic or aliphatic acyl, alkyl optionally substituted by 
substituents selected from amino, carboxy, hydroxy, mercapto, mono- or dialkylamino, 
mono- or diarylamino, alkylarylamino, diarylamino, mono- or diacylamino, alkoxy, 
aikenoxy, aryloxy, thioalkoxy, thioalkenoxy, thioalkynoxy, thioaryloxy and heterocyclyl. 

28. A compound according to claim 25 having the formula: 

i 4 



I 

Amide 




wherein 

G is (CH 2 ),^ wherein a hydrogen on one and only one of the CH 2 groups 
of each G is replaced with-(CH 2 ) c -Amide-T 4 ; 
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26. A compound according to claim 25 wherein V is selected from 
hydrocarbyl, hydrocarbyl-O-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, 
hydrocarbyl-NH-hydrocarbylene, hydrocarbyl-amide-hydrocarbylene, N- 
(hydrocarbyl)hydrocarbylene, N,N-di(hydrocarbyl)hydrocarbylene, hydrocarbylacyl- 
hydrocarbylene, heterocyclylhydrocarbyl wherein the heteroatom(s) are selected from 
oxygen, nitrogen, sulfur and phosphorus, substituted heterocyclylhydrocarbyl wherein the 
heteroatom(s) are selected from oxygen, nitrogen, sulfur and phosphorus and the 
substituents are selected from hydrocarbyl, hydrocarbyl-O-hydrocarbylene, hydrocarbyl- 
NH-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, N-(hydrocarbyl)hydrocarbylene, 
N,N-di(hydrocarbyl)hydrocarbylene and hydrocarbylacyl-hydrocarbylene, as well as 
derivatives of any of the foregoing wherein one or more hydrogens is replaced with an 
equal number of fluorides. 

27. A compound according to claim 25 wherein T 3 has the formula - 
G(R 2 )- , G is alkylene having a single R 2 substituent, and R 2 is selected from alkyl, 
alkenyl, alkynyl, cycloalkyl, aryl-fused cycloalkyl, cycloalkenyl, aryl, aralkyl, 
aryl-substituted alkenyl or alkynyl, cycloalkyl-substituted alkyl, cycloalkenyl-substituted 
cycloalkyl, biaryl, alkoxy, alkenoxy, alkynoxy, aralkoxy, aryl-substituted alkenoxy or 
alkynoxy, alkylamino, alkenylamino or alkynylamino, aryl-substituted alkylamino, 
aryl-substituted alkenylamino or alkynylamino, aryloxy, arylamino, 
N r alkylurea-substituted alkyl, N-arylurea-substituted alkyl, 
alkylcarbonylamino-substituted alkyl, aminocarbonyl-substituted alkyl, heterocyclyl, 
heterocyclyl-substituted alkyl, heterocyclyl-substituted amino, carboxyalkyl substituted 
aralkyl, oxocarbocyclyl-fused aryl and heterocyclylalkyl; cycloalkenyl, aryl-substituted 
alkyl and, aralkyl, hydroxy-substituted alkyl, alkoxy-substituted alkyl, aralkoxy- 
substituted alkyl, alkoxy-substituted alkyl, aralkoxy-substituted alkyl, amino-substituted 
alkyl, (aryl-substituted alkyloxycarbonylamino)-substituted alkyl. thiol-substituted alkyl, 
alkylsulfonyl-substituted alkyl, (hydroxy-substituted alkylthio)-substituted alkyl, 
thioalkoxy-substituted alkyl, hydrocarbylacylamino-substituted alkyl, 
heterocyclylacylamino-substituted alkyl, hydrocarbyl-substituted-heterocyclylacylamino- 
substituted alkyl, alkylsulfonylamino-substituted alkyl, arylsulfonylamino-substituted 
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23. A compound according to claim 20 wherein V is selected from a 
direct bond, a hydrocarbylene, -O-hydrocarbylene, and hydrocarbylene-(0- 
hydrocarbylene) 0 -H, and n is an integer ranging from 1 to 1 0. 

24. A compound according to claim 15 wherein -L-X has the formula: 

d 




wherein one or more of positions b, c, d or e is substituted with hydrogen, 
alkyl, alkoxy, fluoride, chloride, hydroxyl, carboxylate or amide; and R* is hydrogen or 
hydrocarbyl. 

25. A compound according to claim 15 wherein T™ has the formula: 

T 2 -(J-T 3 -) n - 

T 2 is an organic moiety formed from carbon and one or more of hydrogen, 
fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass of 1 5 to 500 
daltons; 

T 3 is an organic moiety formed from carbon and one or more of hydrogen, 
fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass of 50 to 1000 
daltons; 

J is a direct bond or a functional group selected from amide, ester, amine, 
sulfide, ether, thioester, disulfide, thioether, urea, thiourea, carbamate, thiocarbamate, 
Schiff base, reduced Schiff base, imine, oxime, hydrazone, phosphate, phosphonate, 
phosphoramide, phosphonamide, sulfonate, sulfonamide or carbon-carbon bond; and 

n is an integer ranging from 1 to 50, and when n is greater than 1, each T 3 
and J is independently selected. 
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19. A compound according to claim 15 wherein L is selected from L hu , 
L a«d L b« c L ioi L {ri L cnz L cic j^a ^ L ss ^here actinic radiation, acid, base, oxidation, 
reduction, enzyme, electrochemical, thermal and thiol exchange, respectively, cause the 
T^-containing moiety to be cleaved from the remainder of the molecule. 

20. A compound according to claim 19 wherein L hw has the formula 
L ! -L 2 -L\ wherein L 2 is a molecular fragment that absorbs actinic radiation to promote the 
cleavage of T" 15 from X, and L 1 and L 3 are independently a direct bond or an organic 
moiety, where L 1 separates L 2 from T" 5 and L 3 separates L 2 from X, and neither L ! nor L 3 
undergo bond cleavage when L 2 absorbs the actinic radiation. 

21. A compound according to claim 20 wherein -L 2 -L 3 has the 

formula: 

d 




with one carbon atom at positions a, b, c, d or e being substituted with -L 3 - 
X and optionally one or more of positions b, c, d or e being substituted with alkyl, 
alkoxy, fluoride, chloride, hydroxyl, carboxylate or amide; and R 1 is hydrogen or 
hydrocarbyl. 

22. A compound according to claim 21 wherein X is C— R and R 2 

O 

is -OH or a group that either protects or activates a carboxylic acid for coupling with 
another moiety. 
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T^-L-X 

wherein, 



T™* is an organic group detectable by mass spectrometty, comprising 
carbon, a, leas, one of hydrogen and fluoride, and optional atoms selected from oxygen 
nitrogen, sulfiir, phosphorus and iodine; ' 

L is an organic group which allows a T-^nraining moiety ,o be cleaved 
frntn the remainder of the compound, wherein the Staining moiety comprises a 
(uncona, g roU p which supports a si„g,e ionized charge state when the compound is 
subjected to mass spectrometry and is select from tertiary amine, quatenuny amine and 
organic acid; 

X is a functional group selected from hydroxy,, amino, thiol, carboxylic 
ac.d, haloalkyl, and derivatives thereof which either activate or inhibit the activity of the 
group toward coupling with other moieties, or is a nucleic acid fragment attached to L at 
other than the 3' end of the nucleic acid fragment; 

with the provisos that the compound is not bonded to a solid support 
through X nor has a mass of less than 250 daltons. 

16. A compound according to claim 15 wherein T» has a mass of from 
15 to 10,000 daltons and a molecular formula of C,.^,^.,^.,^,^ wherein 
the sum of a, P and 6 is sufficient to satisfy the otherwise unsatisfied valencies of the C 
N, O, P and S atoms. 

1 7. A compound according to claim 1 5 wherein T™ s and L are bonded 
together through a functional group selected from amide, ester, ether, amine, sulfide 
tmoester, disulfide, thioether, urea, thiourea, carbamate, thiocarbamate, Schiff base' 
reduced Schiff base, imine, oxime, hydrazone, phosphate, phosohonate, phosphoramide 
phosphonamide, sulfonate, sulfonamide or carbon-carbon bond. 

18. A compound according to claim 1 7 wherein the functional group is 
selected from amide, ester, amine, urea and carbamate. 
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6. The method according to claim 2 wherein the tags are detected by 
coulometric detectors or amperometric detectors. 

7. The method according to claims 1 or 2 wherein the tagged nucleic 
acid fragments are generated in step (a) from a 5' terminus to a 3' terminus. 

8. The method according to claims I or 2 wherein step (a) generates 
more than four of the tagged nucleic acid fragments and each tag is unique for a nucleic 
acid fragment. 

9. The method according to claims 1 or 2 wherein steps (b), (c) and 
(d) are performed in a continuous manner. 

10. The method according to claims 1 or 2 wherein steps (b), (c) and 
(d) are performed in a continuous manner on a system. 

11. The method according to claims 1 or 2 wherein one or more of the 
steps is automated. 

12. The method according to claims 1 or 2 wherein the tagged 

7 

fragments are generated from oligonucleotide primers that are conjugated to a tag at other 
than the 3' end of the primer. 

13. The method according to claims 1 or 2 wherein the tagged 
fragments are generated from tagged dideoxynucleotide terminators. 

14. The method according to claims 1 or 2 wherein at least one tagged 
nucleic acid fragment is a compound according to any one of claims 15 to 33. 



15. 



A compound of the formula: 
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Claims 

We Claim: 

1 A method for determining the sequence of a nucleic acid molecule, 

comprising: 

(a) generating tagged nucleic acid fragments which are 
complementary to a selected target nucleic acid molecule, wherein a tag is correlative 
with a particular nucleotide and detectable by non-fluorescent spectrometry or 
potentiometry; 

(b) separating the tagged fragments by sequential length; 

(c) cleaving the tags from the tagged fragments; and 

(d) detecting the tags by non-fluorescent spectrometry or 
potentiometry, and therefrom determining the sequence of the nucleic acid molecule. 

2. The method according to claim 1 wherein the detection of the tags 
is by mass spectrometry, infrared spectrometry, ultraviolet spectrometry or potentiostatic 
amperometry. 

3. The method according to claims 1 or 2 wherein the tagged 
fragments are separated in step (b) by a method selected from gel electrophoresis, 
capillary electrophoresis, micro-channel electrophoresis and HPLC. 

4. The method according to claims 1 or 2 wherein the tagged 
fragments are cleaved in step (c) by a method selected from oxidation, reduction, acid- 
labile, base-labile, enzymatic, electrochemical, thermal, thiol exchange and photolabile 
methods. 

5. The method according to claim 2 wherein the tags are detected by 
time-of-flight mass spectrometry, quadrupole mass spectrometry, magnetic sector mass 
spectrometry or electric sector mass spectrometry. 
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TGTAAAACGA CGGCCAGCGT AC 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
TGTAAAACGA CGGCCAGCGT ACC 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
TCGAGGTCGA CGGTATCG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCCGCTCTAG AACTAGTG 
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(xi) SEQUENCE DESCRIPTION: SEO ID N0:10: 
TGTAAAACGA CGGCCAGCG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TGTAAAACGA CGGCCAGCGT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: 
TGTAAAACGA CGGCCAGCGT A 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

TGTAAAACGA CGGCCAGTAT GCAT 

(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 
' (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
TGTAAAACGA CGGCCAGTAT GCATG 
(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TGTAAAACGA CGGCCACG 
(2) INFORMATION FOR SEQ ID NO: 10; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TGTAAAAC6A CG6CCAGTAT G 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 
TGTAAAACGA CGGCCAGTAT GC 22 
(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEO ID N0:6: 
TGTAAAACGA CGGCCAGTAT GCA . 23 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
TGTAAAACGA CGGCCAGT 18 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY, linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
TGTAAAACGA CGGCCAGTA 19 
(2) INFORMATION FOR SEQ ID NO: 3: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
TGTAAAACGA CGGCCAGTAT 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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SEQUENCE I .ISTTNr, 

(1) GENERAL INFORMATION: 

(i) APPLICANTS: Van Ness. Jeffrey 
Tabone, John C. 
Howbert. J. Jeffry 
Mulligan. John T. 

(ii) TITLE OF INVENTION: METHODS AND COMPOSITIONS FOR DETERMINING 
THE SEQUENCE OF NUCLEIC ACID MOLECULES 

(iii) NUMBER OF SEQUENCES: 16 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SEED and BERRY 

(B) STREET: 6300 Columbia Center. 701 Fifth Avenue 

(C) CITY: Seattle 

(D) STATE: Washington 

(E) COUNTRY: USA 

(F) ZIP: 98104-7092 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0. Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 22-JAN-1997 

(C) CLASSIFICATION:. 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: McMasters. David D. 

(B) REGISTRATION NUMBER: 33.963 

(C) REFERENCE/DOCKET NUMBER: 240052.416 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (206) 622-4900 

(B) TELEFAX: (206) 682-6031 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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various modifications may be made without deviating from the spirit and scope of the 
invention. Accordingly, the invention is not limited except as by the appended claims. 
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reactions, using modified T7 DNA polymerase and a-[ 32 P]dATP (> 1000 Ci/mmole) 
using the protocol described above. 

G. cDNA sequencing based on PCR and random shotgun cloning 
5 The following is a method for sequencing cloned cDNAs based on PCR 

amplification, random shotgun cloning, and automated fluorescent sequencing. This 
PCR-based approach uses a primer pair between the usual "universal" forward and 
reverse priming sites and the multiple cloning sites of the Stratagene Bluescript vector. 
These two PCR primers, with the sequence S'-TCGAGGTCGACGGTATCG^' (Seq. 

10 ID No. 15) for the forward or -16bs primer and 5'-GCCGCTCTAGAACTAG TG-3' 
(Seq. ID No. 16) for the reverse or +19bs primer, may be used to amplify sufficient 
quantities of cDNA inserts in the 1.2 to 3.4 kb size range so that the random shotgun 
sequencing approach described below could be implemented. 

The following is the protocol. Incubate four 100 \il PCR reactions, each 

15 containing approximately 100 ng of plasmid DNA, 100 pmoles of each primer, 50 mM 
KC1, 10 mM Tris-HCl pH 8.5, 1.5 mM MgCl 2 , 0.2 mM of each dNTP, and 5 units of 
PE-Cetus Amplitaq in 0.5 ml snap cap tubes for 25 cycles of 95°C for 1 minute, 55°C 
for 1 minute and 72°C for 2 minutes in a PE-Cetus 48 tube DNA Thermal Cycler. After 
pooling the four reactions, the aqueous solution containing the PCR product is placed in 

20 > an nebulizer, brought to 2.0 ml by adding approximately 0.5 to 1 .0 ml of glycerol, and 
equilibrated at -20°C by placing it in either an isopropyl alcohol/dry ice or saturated 
aqueous NaCl/dry ice bath for 10 minutes. The sample is nebulized at -20°C by 
applying 25 - 30 psi nitrogen pressure for 2.5 min. Following ethanol precipitation to 
concentrate the sheared PCR product, the fragments were blunt ended and 

25 phosphorylated by incubation with the Klenow fragment of E. coli DNA polymerase 
and T4 polynucleotide kinase as described previously. Fragments in the 0.4 to 0.7 kb 
range were obtained by elution from a low melting agarose gel. 

From the foregoing, it will be appreciated that, although specific 
30 embodiments of the invention have been described herein for purposes of illustration, 
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collection, the ABI software will automatically open the data analysis software, which 
will create the imaged gel file, extract the data for each sample lane, and process the 
data. 

5 p. Double-stranded sequencing of cDNA clones c ontaining long polv(A) tails 
using anchored polvfd'P primers 

Double-stranded templates of cDNAs containing long poly(A) tracts are 
difficult to sequence with vector primers which anneal downstream of the poly (A) tail. 
Sequencing with these primers results in a long poly(T) ladder followed by a sequence 
10 which may be difficult to read. To circumvent this problem, three primers which 
contain (dT) 17 and either (dA) or (dC) or (dG) at the 3' end were designed to 'anchor' the 
primers and allow sequencing of the region immediately upstream of the poly(A) 
region. Using this protocol, over 300 bp of readable sequence could be obtained. The 
sequence of the opposite strand of these cDN As was determined using insert-specific 
15 primers upstream of the poiy(A) region. The ability to directly obtain sequence 
immediately upstream from the poly(A) tail of cDNAs should be of particular 
importance to large scale efforts to generate sequence-tagged sites (STSs) from cDNAs. 

The protocol is as follows. Synthesize anchored poly (dT) I7 with 
anchors of (dA) or (dC) or (dG) at the 3' end on a DNA synthesizer and use after 
20 purification on Oligonucleotide Purification Cartridges (Amicon, Beverly, MA). For 
sequencing with anchored primers, denature 5-10 \ig of plasmid DNA in a total volume 
of 50 \il containing 0.2 M sodium hydroxide and 0.16 mM EDTA by incubation at 65°C 
for 10 minutes. Add the three poly(dT) anchored primers (2 pmol of each) and 
immediately place the mixture on ice. Neutralize the solution by adding 5 ml of 5 M 
25 ammonium acetate, pH 7.0. 

Precipitate the DNA by adding 1 50 \x\ of cold 95% ethanol and wash the 
pellet twice with cold 70% ethanol. Dry the pellet for 5 minutes and then resuspend in 
MOPS buffer. Anneal the primers by heating the solution for 2 minutes at 65°C 
followed by slow cooling to room temperature for 15-30 minutes. Perform sequencing 
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30 W. Prior to sample loading, the pooled and dried reaction products are resuspended 
in formamide/EDTA loading buffer by vortexing and then heated at 90°C. A sample 
sheet is created within the ABI data collection software on the Macintosh computer 
which indicates the number of samples loaded and the fluorescent-labeled mobility file 
5 to use for sequence data processing. After cleaning the sample wells with a syringe, the 
odd-numbered sequencing reactions are loaded into the respective wells using a 
micropipettor equipped with a flat-tipped gel-loading tip. The gel is then 
electrophoresed for 5 minutes before the wells are cleaned again and the even numbered 
samples are loaded. The filter wheel used for dye-primers and dye-terminators is 

10 specified on the ABI 373 A CPU. Typically electrophoresis and data collection are for 
10 hours at 30W on the ABI 373A that is fitted with a heat-distributing aluminum plate. 
After data collection, an image file is created by the ABI software that relates the 
fluorescent signal detected to the corresponding scan number. The software then 
determines the sample lane positions based on the signal intensities. After the lanes are 

15 tracked, the cross-section of data for each lane are extracted and processed by baseline 
subtraction, mobility calculation, spectral deconvolution, and time correction. After 
processing, the sequence data files are transferred to a SPARCstation 2 using NFS 
Share. 

Protocol: prepare 8 M urea, 4.75% polyacrylamide gels, as described 
20 above, using a 36-well comb. Prior to loading, clean the outer surface of the gel plates. 
Assemble the gel plates into an ABI 373A DNA Sequencer (Foster City, CA) so that 
the lower scan (usually the blue) line corresponds to an intensity value of 800-1000 as 
displayed on the computer data collection window. If the baseline of four-color scan 
lines is not flat, reclean the glass plates. Affix the aluminum heat distribution plate. 
25 Pre-electrophorese the gel for 10-30 minutes. Prepare the samples for loading. Add 3 
ul of FE to the bottom of each tube, vortex, heat at 90°C for 3 minutes, and centrifuge 
to reclaim condensation. Flush the sample wells with electrophoresis buffer using a 
syringe. Using flat-tipped gel loading pipette tips, load each odd-numbered sample. 
Pre-electrophorese the gel for at least 5 minutes, flush the wells again, and then load 
30 each even-numbered sample. Begin the electrophoresis (30 W for 1 0 hours). After data 
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To each reaction, add the following reagents and incubate for 1 0 minutes 

at 37°C. 

4 ul 1 OX Mn 2 7isocitrate buffer 

6 ul AB1 terminator mi 
2 ul diluted Sequenase TM (3.25 
U/ul) 

1 ul 2 mM [alpha]-S-dNTPs 
22 ul 

5 The undiluted SEQUENASE™ from United States Biochemicals is 13 

U/ul and should be diluted 1 :4 with USB dilution buffer prior to use. Add 60 ul 8 M 
ammonium acetate and 300 ul 95% ethanol to stop the reaction and vortex. Precipitate 
the DNA in an ice-water bath for 10 minutes. Centrifuge for 15 minutes at 1 0,000 xg in 
a microcentrifuge at 4°C. Carefully decant the supernatant, and rinse the pellet by 
10 adding 300 ul of 80% ethanol. Mix the sample and centrifuge again for 15 minutes, and 
carefully decant the supernatant. Repeat the rinse step to insure efficient removal of the 
unincorporated terminators. Dry the DNA for 5-10 minutes (or until dry) in the Speed- 
Vac. 



15 E - Sequence gel preparation, nre-electroohore sis. sample loading, electrop hone 
data collection, and a nalysis on the ABI 373A DNA sequencer 

Polyacrylamide gels for DNA sequencing are prepared as described 
above, except that the gel mix is filtered prior to polymerization. Glass plates are 
carefully cleaned with hot water, distilled water, and ethanol to remove potential 
20 fluorescent contaminants prior to taping. Denaturing 6% polyacrylamide gels are 
poured into 0.3 mm x 89 cm x 52 cm taped plates and fitted with a 36 well comb. After 
polymerization, the tape and the comb are removed from the gel and the outer surfaces 
of the glass plates are cleaned with hot water, and rinsed with distilled water and 
ethanol. The gel is assembled into an ABI sequencer, and the checked by laser- 
5 scanning. If baseline alterations are observed on the ABI-associated Macintosh 
computer display, the plates are recleaned. Subsequently, the buffer wells are attached, 
electrophoresis buffer is added, and the gel is pre-electrophoresed for 10-30 minutes at 
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7 ul ABI terminator mix (Catalogue 

No. 401489) 
2 ul diluted Sequenase TM (3.25 U/ul) 

1 »il 2 mM a-S dNTPs 

22 ul 

The undiluted Sequenase TM (Catalogue No. 70775. United States 
Biochemicals, Cleveland, OH) is 13 U/ul and is diluted 1:4 with USB dilution buffer 
5 prior to use. Add 20 |il 9.5 M ammonium acetate and 100 ul 95% ethanol to stop the 
reaction and mix. 

Precipitate the DNA in an ice-water bath for 10 minutes. Centrifuge for 
15 minutes at 10,000 xg in a microcentrifuge at 4°C. Carefully decant the supernatant, 
and rinse the pellet by adding 300 ul of 70-80% ethanol. Mix and centrifuge again for 
10 15 minutes and carefully decant the supernatant. 

Repeat the rinse step to insure efficient removal of the unincorporated 
terminators. Dry the DNA for 5-10 minutes (or until dry) in the Speed- Vac, and store 
the dried reactions at -20°C. 

15 For double-stranded reactions: 

Add the following to a 1 .5 ml microcentrifuge tube: 

" 5 |il ds DNA (5 ug) 
4 jil IN NaOH 

3ul ddH 2 0 

Incubate the reaction at 65°C-70°C for 5 minutes, and then briefly 
20 centrifuge to reclaim condensation. Add the following reagents to each reaction, vortex, 
and briefly centrifuge: 



3ul 
9 |il 
4 ul 



8 uM primer 
ddH z O 

MOPS-Acid buffer 
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Sequenase™ (UBS, Cleveland OH) catalyzed sequencing with labeled 
terminators. Single-stranded terminator reactions require approximately 2 fag of phenol 
extracted M13-based template DNA. The DNA is denatured and the primer annealed 
by incubating DNA, primer, and buffer at 65°C. After the reaction cooled to room 

5 temperature, alpha-thio-deoxynucleotides, labeled terminators, and diluted Sequenase 
TM DNA polymerase are added and the mixture is incubated at 37°C. The reaction is 
stopped by adding ammonium acetate and ethanol, and the DNA fragments are 
precipitated and dried. To aid in the removal of unincorporated terminators, the DNA 
pellet is rinsed twice with ethanol. The dried sequencing reactions could be stored up to 

1 0 several days at -20°C. 

Double-stranded terminator reactions required approximately 5 jag of 
diatomaceous earth modified-alkaline lysis midi-prep purified plasmid DNA. The 
double-stranded DNA is denatured by incubating the DNA in sodium hydroxide at 
65°C, and after incubation, primer is added and the reaction is neutralized by adding an 

1 5 acid-buffer. Reaction buffer, alpha-thio-deoxynucleotides, labeled dye-terminators, and 
diluted Sequenase TM DNA polymerase then are added and the reaction is incubated at 
37°C. Ammonium acetate is added to stop the reaction and the DNA fragments are 
precipitated, rinsed, dried, and stored. 

20 For Single-stranded reactions: 

; ' Add the following to a 1 .5 ml microcentrifuge tube: 

4\i\ ssDNA(2^g) 

4 nl 0.8 \xM primer 

2^1 lOx MOPS buffer 

2 |d 1 Ox Mn 2 7isocitrate buffer 

12fil 

To denature the DNA and anneal the primer, incubate the reaction at 
25 65°C-70°C for 5 minutes. Allow the reaction to cool at room temperature for 15 
minutes, and then briefly centrifuge to reclaim condensation. To each reaction, add the 
following reagents and incubate for 10 minutes at 37°C 
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apply gentle pressure to the column with a pipet bulb.) Insert the column into the wash 
tube provided. Spin in a variable-speed microcentrifuge at 1300 xg for 2 minutes to 
remove the fluid. Remove the column from the wash tube and insert it into a sample 
collection tube. Carefully remove the reaction mixture (20 ul) and load it on top of the, 
5 gel material. If the samples were incubated in a cycling instrument that required 
overlaying with oil, carefully remove the reaction from beneath the oil. Avoid picking 
up oil with the sample, although small amounts of oil (<1 ul) in the sample will not 
affect results. Oil at the end of the pipet tip containing the sample can be removed by 
touching the tip carefully on a clean surface (e.g., the reaction tube). Use each column 
10 only once. Spin in a variable-speed microcentrifuge with a fixed angle rotor, placing 
the column in the same orientation as it was in for the first spin. Dry the sample in a 
vacuum centrifuge. Do not apply heat or over dry. If desired, reactions can be 
precipitated with ethanol. 

15 D - Terminator Reaction C lean-Up via Sephadex G-50 Filled Microtiter Format 
Filter Plates 

Sephadex (Pharmacia, Piscataway, NJ) settles out; therefore, you must 
resuspend before adding to the plate and also after filling every 8 to 10 wells. Add 400 
Ul of mixed Sephadex G-50 to each well of microtiter filter plate. Place microtiter filter 

20 > plate on top of a microtiter plate to collect water and tape sides so they do not fly apart 
during centrifugation. Spin at 1500 rpm for 2 minutes. Discard water that has been 
collected in the microtiter plate. Place the microtiter filter plate on top of a microtiter 
plate to collect water and tape sides so they do not fly apart during centrifugation. Add 
an additional 100-200 pi of Sephadex G-50 to fill the microtiter plate wells. Spin at 

25 1500 rpm for 2 minutes. Discard water that has been collected in the microtiter plate. 
Place the microtiter filter plate on top of a microtiter plate with tubes to collect water 
and tape sides so they do not fly apart during centrifugation. Add 20 pi terminator 
reaction to each Sephadex G-50 containing wells. Spin at 1500 rpm for 2 minutes. 
Place the collected effluent in a Speed- Vac for approximately 1 -2 hours. 
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B. Taq-polvmerase catalyzed cycle sequencing using MW-identifier-labeled 
terminator reactions 

One problem in DN A cycle sequencing is that when primers are used the 
reaction conditions are such that the nested fragment set distribution is highly 
5 dependent upon the template concentration in the reaction mix. It has been recently: 
observed that the nested fragment set distribution for the DNA cycle sequencing 
reactions using the labeled terminators is much less sensitive to DNA concentration 
than that obtained with the labeled primer reactions as described above. In addition, the 
terminator reactions require only one reaction tube per template while the labeled 

1 0 primer reactions require one reaction tube for each of the four terminators. The protocol 
described below is easily interfaced with the 96 well template isolation and 96 well 
reaction clean-up procedures also described herein. 

Place 0.5 \ig of single-stranded or 1 fig of double-stranded DNA in 0.2 
ml PCR tubes. Add 1 nl (for single stranded templates) or 4 \x\ (for double-stranded 

15 templates) of 0.8 nM primer and 9.5 jjil of AB1 supplied premix to each tube, and bring 
the final volume to 20 \i\ with ddH 2 0. Centrifuge briefly and cycle as usual using the 
terminator program as described by the manufacturer (i.e., preheat at 96°C followed by 
25 cycles of 96°C for 15 seconds, 50°C for 1 second, 60°C for 4 minutes, and then link 
to a 4°C hold). Proceed with the spin column purification using either the Centri-Sep 

20 columns (Amicon, Beverly, MA) or G-50 microtiter plate procedures given below. 

C. Terminator Reaction Clean-Up via Centri-Sep Columns 

A column is prepared by gently tapping the column to cause the gel 
material to settle to the bottom of the column. The column stopper is removed and 0.75 

25 ml dH 2 0 is added. Stopper the column and invert it several times to mix. Allow the gel 
to hydrate for at least 30 minutes at room temperature. Columns can be stored for a few 
days at 4°C. Allow columns that have been stored at 4°C to warm to room temperature 
before use. Remove any air bubbles by inverting the column and allowing the gel to 
settle. Remove the upper-end cap first and then remove the lower-end cap. Allow the 

30 column to drain completely, by gravity. (Note: If flow does not begin immediately 
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thin-walled reaction tube, corresponding to the A, C, G, and T reactions, and then an 
aliquot of the respective reaction mixes is added to the side of each tube. These tubes 
are part of a 96-tube/retainer set tray in a microtiter plate format, which fits into a 
Perkin Elmer Cetus Cycler 9600 (Foster City, CA). Strip caps are sealed onto the 
5 tube/retainer set and the plate is centrifuged briefly. The plate then is placed in the 
cycler whose heat block had been preheated to 95°C, and the cycling program 
immediately started. The cycling protocol consists of 15-30 cycles of: 95 °C 
denaturation; 55°C annealing; 72°C extension; 95°C denaturation; 72°C extension; 
95°C denaturation, and 72°C extension, linked to a 4°C final soak file. 

1 0 At this stage, the reactions may be frozen and stored at -20°C for up to 

several days. Prior to pooling and precipitation, the plate is centrifuged briefly to 
reclaim condensation. The primer and base-specific reactions are pooled into ethanol, 
and the precipitated DNA is collected by centrifugation and dried. These sequencing 
reactions could be stored for several days at -20°C. 

15 The protocol for the sequencing reactions is as follows. For A and C 

reactions, 1 \il 9 and for G and T reactions, 2 \xl of each DNA sample (100 ng/ul for Ml 3 
templates and 200 ng/ul for pUC templates) are pipetted into the bottom of the 0.2 ml 
thin-walled reaction tubes. AmpliTaq polymerase (N80 1-0060) is from Perkin-Elmer 
Cetus (Foster City, CA). 

20 A mix of 30 |il AmpliTaq (5U/nl), 30 ^1 5X Taq reaction buffer, 1 30 ^1 

ddH20 f and 190 nl diluted Taq for 24 clones is prepared. 

A, C, G, and T base specific mixes are prepared by adding base-specific 
primer and diluted Tag to each of the base specific nucleotide/buffer premixes: 

A,C/G,T 

60/120 nl 5X Taq cycle sequencing mix 
30/60 fxl diluted Taq polymerase 
30/60 \il respective fluorescent end- 
labeled primer 

120/240 nl 

25 
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termination electrophoresis 

reaction polyacrylamide gel conditions 

5%, 0.1 5 mm x 50 cm x 20 cm 2.25 hours at 22 mA 
4%, 0.1 5 mm x 70 cm x 20 cm 8-9 hours at 1 5 mA 

4%, 0.15 mm x 70 cm x 20 cm 20-24 hours at 15 mA 

Each base-specific sequencing reaction terminated (with the short 
termination) mix is loaded onto a 0.15 mm x 50 cm x 20 cm denaturing 5% 
polyacrylamide gel; reactions terminated with the long termination mix typically are 
5 divided in half and loaded onto two 0.15 mm x 70 cm x 20 cm denaturing 4% 
polyacrylamide gels. 

After electrophoresis, buffer is removed from the wells, the tape is 
removed, and the gel plates separated. The gel is transferred to a 40 cm x 20 cm sheet 
of 3 MM Whatman paper, covered with plastic wrap, and dried on a Hoefer (San 
10 Francisco, CA) gel dryer for 25 minutes at 80°C. The dried gel is exposed to Kodak 
(New Haven, CT) XRP-1 film. Depending on the intensity of the signal and whether 
the radiolabel is 32 P or 35 S, exposure times vary from 4 hours to several days. After 
exposure, films are developed by processing in developer and fixer solutions, rinsed 
with water, and air dried. The autoradiogram is then placed on a light-box, the 
1 5 sequence is manually read, and the data typed into a computer. 

Taq-polymerase catalyzed cycle sequencing using labeled primers. Each 
base-specific cycle sequencing reaction routinely included approximately 100 or 200 ng 
isolated single-stranded DNA for A and C or G and T reactions, respectively. Double- 
stranded cycle sequencing reactions similarly contained approximately 200 or 400 ng of 
20 plasmid DNA isolated using either the standard alkaline lysis or the diatomaceous earth- 
modified alkaline lysis procedures. All reagents except template DNA are added in one 
pipetting step from a premix of previously aliquoted stock solutions stored at -20°C. 
Reaction premixes are prepared by combining reaction buffer with the base-specific 
nucleotide mixes. Prior to use, the base-specific reaction premixes are thawed and 
25 combined with diluted Taq DNA polymerase and the individual end-labeled universal 
primers to yield the final reaction mixes. Once the above mixes are prepared, four 
aliquots of single or double-stranded DNA are pipetted into the bottom of each 0.2 ml 



short 
long 
long 
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4% 


5% 


6% 


urea 


48 g 


48 


48 g 


40% 


10 ml 


12.5 ml 


15 ml 


acrylamide/bisacrylamide 








10XMTBE 


10 ml 


10 ml 


10 ml 


ddH 2 0 


42 ml 


39.5 ml 


37 ml 


15% APS 


500 ul 


500 Ml 


500 m1 


TEMED 


50 Ml 


50 ul 


50 Ml 



Urea (5505UA) is obtained from Gibco/BRL (Gaithersburg, MD). AH 
other materials are obtained from Fisher (Fair Lawn, NJ). Briefly, urea, MTBE buffer 
and water are combined, incubated for 5 minutes at 55°C, and stirred to dissolve the 
5 urea The mixture is cooled briefly, acrylamide/bis-acrylamide solution is added and 
mixed, and the entire mixture is degassed under vacuum for 5 minutes. APS and 
TEMED polymerization agents are added with stirring. The complete gel mix is 
immediately poured in between the taped glass plates with 0.15 mm spacers. Plates are 
prepared by first cleaning with ALCONOX™ (New York, NY) detergent and hot water, 

1 0 are rinsed with double distilled water, and dried. Typically, the notched glass plate is 
treated with a silanizing reagent and then rinsed with double distilled water. After 
pouring, the gel is immediately laid horizontally, the well forming comb is inserted, 
clamped into place, and the gel allowed to polymerize for at least 30 minutes. Prior to 
loading, the tape around the bottom of the gel and the well-forming comb is removed. 

15 A vertical electrophoresis apparatus is then assembled by clamping the upper and lower 
buffer chambers to the gel plates, and adding IX MTBE electrophoresis buffer to the 
chambers. Sample wells are flushed with a syringe containing running buffer, and 
immediately prior to loading each sample, the well is flushed with running buffer using 
gel loading tips to remove urea. One to two microliters of sample is loaded into each 

20 well using a Pipetteman (Rainin, Emeryville, CA) with gel-loading tips, and then 
electrophoresed according the following guidelines (during electrophoresis, the gel is 
cooled with a fan): 
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219.3 
221.2 
228.2 
234.2 
5 237.4 
241.4 

The data indicate that 22 of 31 compounds appeared in the spectrum with 
the anticipated mass, 9 of 31 compounds appeared in the spectrum with an + H mass (1 
1 0 atomic mass unit, amu) over the anticipated mass. The latter phenomenon is probably 
due to the protonation of an amine within the compounds. Therefore 31 of 31 
compounds are detected by MALDI Mass Spectroscopy. More importantly, the 
example demonstrates that multiple tags can be detected simultaneously by a 
spectroscopic method. 

15 The alpha-cyano matrix alone (Figure 12) gave peaks at 146.2, 164.1, 

172.1, 173.1, 189.1, 190.1, 191.1, 192.1, 212.1, 224.1, 228.0, 234.3. Other identified 
masses in the spectrum are due to contaminants in the purchased compounds as no 
effort was made to further purify the compounds. 

20 EXAMPLE 17 

A Procedure for Sequencing with MW-Identifier-Labeled Primers, 
Radiolabeled Primers, MW-Identifier-Labeled-Dideoxy-Terminators, 
Fluorescent-Primers and Fluorescent-Dideoxy-Terminators 



25 A. Preparation sequencing gels and electrophoresis 

The protocol is as follows. Prepare 8 M urea, polyacrylamide gels 
according to the following recipes (100 ml) for 4%, 6%, or 8% polyacrylamide. 
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158.3 

4-amino-5-imidazole-carboxyamide ( 1 62.58) 

163.3 

1 65 .2 — > 3,4-pyridine-dicarboxyamide ( 1 65 . 1 6) 
5 165.2 — > 4-ethoxybenzamide (165.19) 

1 66.2 > 2,3-pyrazinedicarboxamide ( 1 66. 1 4) 

1 66.2 — > 2-nitrobenzamide ( 1 66. 1 4) 

3 -fluoro-4-methoxy benzoic acid (170.4) 

171.1 
10 172.2 
173.4 

indole-3-acetamide (174.2) 

178.3 

1 79.3 — > 5-acetylsalicy lamide ( 1 79. 1 8) 

15 181.2 — > 3,5-dimethoxybenzamide (181.19) 
182.2 — > 



186.2 



1 -naphthaleneacetaraide ( 1 85.23) 
8-chloro-3 ,5-diamino-2-pyrazinecarboxyamide ( 1 87.59) 

20 . 188.2 

189.2 — > 4-trifluoromethyl-benzamide (189.00) 

190.2 

191.2 

192.3 

25 5-amino-5-phenyl-4-pyrazole-carboxamide (202.22) 

203.2 
203.4 

l-methyl-2-benzyl-malonamate (207.33) 
4-amino-2,3,5,6-tetrafluorobenzamide (208.1 1) 
30 212.2 — > 2,3-napthlenedicarboxylic acid (212.22). 
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napthlenedicarboxylic acid (212.22). The compounds are placed in DMSO at the 
concentration described above. One ul of the material is then mixed with alpha-cyano- 
4-hydroxy cinnamic acid matrix (after a 1:10,000 dilution) and deposited on to a solid 
stainless steel support. 

5 The material is then desorbed by a laser using the Protein TOF Mass 

Spectrometer (Broker, Manning Park, MA) and the resulting ions are measured in both 
the linear and reflectron modes of operation. The following m/z values are observed 
(Figure 11): 



10 121.1 — > benzamide (121.14) 

1 22. 1 — > nicotinamide ( 1 22. 1 3) 

123.1 — > pyrazinamide (123.12) 
124.1 

125.2 

15 1 27.3 — > 3-amino-4-pyrazolecarboxylic acid (1 27. 1 0) 

1 27.2 — > 2-thiophenecarboxamide (127.17) 
135.1 — > 4-aminobenzamide (135.15) 

135.1 — > tolumide (135.17) 

1 36.2 — > 6-methylnicotinamide (1 36. 1 5) 
20 137.1 — > 3-aminonicotinamide (137.14) 

v 138,2 — > nicotinamide N-oxide (138.12) 

138.2 — > 3-hydropicolinamide (138.13) 

139.2 — > 4-fluorobenzamide (139.13) 
140.2 

25 147.3 — > cinnamamide (147.18) 
148.2 
149.2 

4-methoxybenzamide (151.17) 

152.2 

30 2,6-difluorbenzamide (157.12) 
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oxide derivative, 227.1 amu indicates the 2-nitrobenzamide, 179.18 amu indicates the 
5-acetylsalicylic acid derivative, 226.1 amu indicates the 4-ethoxybenzoic acid 
derivative, 209.1 amu indicates the cinnamic acid derivative, and 198.1 amu indicates 
the 3-aminonicotinic acid, the first sequence can be deduced as -5-TATGCA-3'- and 
the second sequence can be deduced as -S'-CGTACCO'-. Thus, it is possible to 
sequence more than one DNA sample per separation step. 



EXAMPLE 16 
Demonstration of the Simultaneous Detection of 
Multiple Tags by Mass Spectrometry 

This example provides a description of the ability to simultaneously 
detect multiple compounds (tags) by mass spectrometry. In this particular example, 3 1 
compounds are mixed with a matrix, deposited and dried on to a solid support and then 
desorbed with a laser. The resultant ions are then introduced in a mass spectrometer. 

The following compounds (purchased from Aldrich, Milwaukee, WI) are^ 
mixed together on an equal molar basis to a final concentration of 0.002 M (on a per 
compound) basis: benzamide (121.14), nicotinamide (122.13), pyrazinamide (123.12),. 
3-amino-4-pyrazolecarboxylic acid (127.10), 2-thiophenecarboxamide (127.17), 4- 
aminobenzamide (135.15), tolumide (135.17), 6-methylnicotinamide (136.15), 3- 
aminonicotinamide (137.14), nicotinamide N-oxide (138.12), 3-hydropicolinamide 
(138.13), 4-fluorobenzamide (139.13), cinnamamide (147.18), 4-methoxybenzamide 

(151.17) , 2,6-difluorbenzamide (157.12), 4-amino-5-imidazole-carboxyamide (162.58), 
3,4-pyridine-dicarboxyamide (165.16), 4-ethoxybenzamide (165.19). 2,3- 
pyrazinedicarboxamide (166.14), 2-nitrobenzamide (166.14), 3-fluoro-4- 
methoxybenzoic acid (170.4), indole-3-acetamide (174.2), 5-acetylsalicylamide 

(179.18) , 3,5-dimethoxybenzamide (181.19), 1-naphthaleneacetamide (185.23), 8- 
chloro-3,5-diamino-2-pyrazinecarboxyamide (1 87.59), 4-trifluoromethyl-benzamide 
(189.00), 5-amino-5-phenyl-4-pyrazole-carboxamide (202.22), l-methyI-2-benzyl- 
malonamate (207.33), 4-amino-2,3,5,6-tetrafluorobenzamide (208.11), 2,3- 
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10 


10 


none 


40 


40 


196.1, 179.18, 226.1 




11 


11 


none 


41 


41 


196.1,226.1 




12 


12 


none 


42 


42 


196.1; 182.1,226.1 




13 


13 


none 


43 


43 


182.1,226.1,209.1 


5 


14 


14 


none 


44 


44 


182.1,209.1 




15 


15 


none 


45 


45 


182.1; 235.2, 209.1, 
198.1 




16 


16 


none 


46 


46^ 


235.2, 198.1 




17 


17 


none 


47 


47 


235.2, 198.1 


10 


18 


18 


none 


48 


48 


235.2;, 198.1,218.1 




19 


19 


none 


49 


49 


218.1 




20 


20 


none 


50 


50 


218.1 




21 


21 


none 


51 


51 


none 




22 


22 


none 


52 


52 


none 


15 


23 


23 


none 


53 


53 


none 




24 


24 


none 


54 


54 


none 




25 


25 


none 


55 


55 


none 




26 


26 


none 


56 


56 


none 




27 


27 


none 


57 


57 


none 


20 


28 


28 


none 


58 


58 


none 




' 29 


29 


212.1 


59 


59 


none 




30 


30 


212.1 


60 


60 


none 



The temporal appearance of the tags for set #1 is 212.1, 200.1, 196.1, 
25 182.1, 235.2, 218.1, 199.1, 227.1, and the temporal appearance of tags for set #2 is 
199.1, 227.1, 179.1, 226.1, 209.1, 198.1. Since 212.1 amu indicates the 4- 
methoxybenzoic acid derivative, 200.1 indicates the 4-fluorobenzoic acid derivative, 
196.1 amu indicates the toluic acid derivative, 182.1 amu indicates the benzoic acid 
derivative, 235.2 amu indicates the indole-3-acetic acid derivative, 218.1 amu 
30 indicates the 2,6-difluorobenzoic derivative, 199.1 amu indicates the nicotinic acid N- 
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the sample well to the detection region, prepared according to the manufacturers 
instructions. DNA sequencing reactions are prepared as described by the manufacturer 
utilizing Taq polymerase (Promega Corp., Madison, WI) and are performed on 
M13mpl9 single-stranded DNA template prepared by standard procedures. Sequencing 
reactions are stored at -20°C in the dark and heated at 90°C for 3 min in formamide just 
prior to sample loading. They are loaded on the 370A with a pipetman according to the 
manufacturers instructions and on the CE by electrokinetic injection at 10,000 V for 10 
seconds. Ten fil fractions are collected during the run by removing the all the liquid at 
the bottom electrode and replacing it with new electolyte. 

To cleave the tag from the oligonucleotide, 100 ^1 of 0.05 M 
dithiothreitol (DTT) is added to each fraction. Incubation is for 30 minutes at room 
temperature. NaCl is then added to 0.1 M and 2 volumes of EtOH is added to 
precipitate the ODNs. The ODNs are removed from solution by centrifugation at 
14,000 x G at 4°C for 15 minutes. The supernatents are reserved, dried to completeness 
15 under a vacuum with centrifugation. The pellets are then dissolved in 25 jxl MeOH. 
The pellet is then tested by mass spectrometry for the presence of tags. The same 
MALDI technique is employed as described in Example 4. The following MWs (tags) 
are observed in the mass spectra as a function of time: 



10 



20 


Fraction # 


Time 


MWs 


Fraction # 


Time 


MWs 




1 


1 


none 


31 


31 


212.1, 




2 


2 


none 


32 


32 


212.1, 199.1 




3 


3 


none 


33 


33 


212.1, 199.1 




4 


4 


none 


34 


34 


212.1; 200.1, 199.1, 


25 












227.1 




5 


5 


none 


35 


35 


200.1,199.1,227.1 




6 


6 


none 


36 


36 


200.1,227.1 




7 


7 


none 


37 


37 


200.1,227.1, 179.18 




8 


8 


none 


38 


38 


200.1; 196.1, 179.18 


30 


9 


9 


none 


39 


39 


200.1; 196.1, 179.18 
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Preparation of Gel-Filled Capillaries 

Fifty-centimeter fused silica capillaries (375 mm o.d., 50 mm i.d., 
Polymicro Technologies, Phoenix, AZ) with detector windows (where the polyimide 
coating has been removed from the capillary) at 25 cm are used in the separations. The 

5 inner surface of the capillaries are derivatized with 
(methyacryloxypropyl)trimethoxysilane (MAPS) (Sigma, St. Louis, MO) to permit 
covalent attachment of the gel to the capillary wall (Nashabeh et al., Anal Chem 
65:2148, 1994). Briefly, the capillaries are cleaned by successively flowing 
trifluoracetic acid, deionized water, and acetone through the column. After the acetone 

1 0 wash, 0.2% solution of MAPS in 50/50 water/ethanol solution is passed through the 
capillary and left at room temperature for 30 min. The solution is removed by 
aspiration and the tubes are dried for 30 min under an infrared heat lamp. 

Gel-filled capillaries are prepared under high pressure by a modification 
of the procedure described by Huang et al. (J. Chromatography 600:289, 1992). Four 

15 percent poly(acrylamide) gels with 5% cross linker and 8.3 urea are used for all the 
studies reported here. A stock solution is made by dissolving 3.8 g of acrylamide, 0.20 
g of N ? N'-methylenebis (acrylamide), and 50 g of urea into 100 mL of TBE buffer (90 
mM Tris borate, pH 8.3, 0.2 mM EDTA). Cross linking is initiated with 10 mL of 
N^.N'^-tetramethylethyienediamine (TEMED) and 250 mL of 10% ammonium 

20 persulfate solution. The polymerizing solution is quickly passed into the derivatized 
' column. Filled capillaries are then placed in a steel tube lx mx 1/8 in. i.d. x 1/4 in. 
o.d. filled with water, and the pressure is raised to 400 bar by using a HPLC pump and 
maintained at that pressure overnight. The pressure is gradually released and the 
capillaries are removed. A short section of capillary from each end of the column is 

25 removed before use. 

Separation and Detection of DNA Fragments 

Analysis of DNA sequencing reactions separated by conventional 
electrophoresis is performed on an ABI 3 70 A DNA sequencer. This instrument uses a 
30 slab denaturing urea poly(acrylamide) gel 0.4 mm thick with a distance of 26 cm from 
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Ste P B To a solution of one of the 5 , -[6-(4,6-dichloro-1.3.5-triazin-2- 
ylamino)hexyl]oligonucleotides (compounds XII,.,,, ) at a concentration of 1 mg/ml in 
100 mM sodium borate (pH 8.3) was added a 100-fold molar excess of a primary amine 
selected from R I . J6 -Lys(e-iNIP)-ANP-Lys(e-NH 2 )-NH 2 (compounds X,., 6 from Example 
11). The solution is mixed overnight at ambient temperature. The unreacted amine is 
removed by ultrafiltration through a 3000 MW cutoff membrane (Amicon, Beverly, 
MA) using H 2 0 as the wash solution (3 X). The compounds XIII,. 36 are isolated by 
reduction of the volume to 100 mL. 



EXAMPLE 1 5 

Demonstration of Sequencing Using a CE Separation Method, Collecting 
Fractions, Cleaving the Tag, Determining the Mass (and thus the Identity) of 
the Tag and then Deducing the Sequence. 

15 

In this example, two DNA samples are sequenced in a single separation 

method. 

CE Instrumentation 

20 The CE instrument is a breadboard version of the instrument available 

commercially from Applied Biosystems, Inc. (Foster City, CA). It consists of Plexiglas 
boxes enclosing two buffer chambers, which can be maintained at constant temperature 
with a heat control unit. The voltage necessary for electrophoresis is provided by a 
high-voltage power supply (Gamma High Voltage Research, Ormond Beach, FL) with a 

25 magnetic safety interlock, and a control unit to vary the applied potential. Sample 
injections for open tube capillaries are performed by use of a hand vacuum pump to 
generate a pressure differential across the capillary (vacuum injection). For gel-filled 
capillaries, samples are electrophoresed into the tube by application of an electric field 
(electrokinetic injection). 
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EXAMPLE 13 

Preparation of a Set of Compounds 
of the Formula R.^-LysCe-iNIP^ANP-S'-AH-ODN 

Figure 9 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = MOI, where MOI is a nucleic acid fragment, ODN) derived from the esters of 
Example 7 (the same procedure could be used with other T-L-X compounds wherein X 
is an activated ester). The MOI is conjugated to T-L through the 5' end of the MOI, via 
a phosphodiester - alkyleneamine group. 

Referring to Figure 9: 

Ste P A - Compounds XII,. 36 are prepared according to a modified biotinylation 
procedure in Van Ness et al., Nucleic Acids Res., 19, 3345 (1991). To a solution of one 
of the 5'-aminohexyl oligonucleotides (compounds Xl,. 36 , 1 mg) in 200 mM sodium 
borate (pH 8.3, 250 mL) is added one of the Tetrafluorophenyl esters (compounds X,. 36 
from Example A, 100-fold molar excess in 250 mL of NMP). The reaction is incubated 
overnight at ambient temperature. The unreacted and hydrolyzed tetrafluorophenyl 
esters are removed from the compounds XI1,. 36 by Sephadex G-50 chromatography. 

EXAMPLE 14 
Preparation of a Set of Compounds 
of the Formula R,. 36 -LYS(E-iNIP)-ANP-LYS(e-(MCT-5'-AH-ODN))-NH 2 

Figure 10 illustrates the parallel synthesis of a set of 36 T-L-X 
compounds (X = MOI, where MOI is a nucleic acid fragment, ODN) derived from the 
amines of Example 1 1 (the same procedure could be used with other T-L-X compounds 
wherein X is an amine). The MOI is conjugated to T-L through the 5' end of the MOI, 
via a phosphodiester - alkyleneamine group. 

Referring to Figure 10: 

Step_A. The 5'-[6-(4,6-dichloro-l,3,5-triazin-2-ylamino)hexyI]oligonucleotides XII,.. 36 
are prepared as described in Van Ness et al., Nucleic Acids Res., 19, 3345 (1991). 
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EXAMPLE 12 
Preparation of a Set of Compounds 
of the Formula R,^-Lys(£-Tfa)-Lys(e-iINP)-ANP-Tfp 

5 Figure 8 illustrates the parallel synthesis of a set of 36 T-L-X compounds 

( x = L h)> where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
ortho-nitrobenzylamine group with L 3 being a methylene group that links L h and L 2 , T 
has a modular structure wherein the carboxylic acid group of a first lysine has been 
joined to the nitrogen atom of the L 2 benzylamine group to form an amide bond, a mass 

10 spec sensitivity enhancer group (introduced via N-methylisonipecotic acid) is bonded 
through the e-amino group of the first lysine, a second lysine molecle has been .joined 
to the first lysine through the a-amino group of the first lysine, a molecular weight 
adjuster group (having a trifluoroacetyl structure) is bonded through the e-amino group 
of the second lysine, and a variable weight component R,_ 36 , (where these R groups 

15 correspond to T 2 as defined herein, and may be introduced via any of the specific 
carboxylic acids listed herein) is bonded through the a-amino group of the second 
lysine. Referring to Figure 8: 

Steps A-E . These steps are identical to steps A-E in Example 7. 

20 Step F . The resin (compound VI) is treated with piperidine as described in step B in 
Example 7 to remove the FMOC group. 

Step G . The deprotected resin (compound VII) is coupled to Fmoc-Lys(Tfa)-OH using 
the coupling method described in step C of Example 7 to give compound VIII. 

25 

Steps H-K . The resin (compound VIII) is treated as in steps F-J in Example 7 to give 
compounds XI N36 . 
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EXAMPLE 1 1 
Preparation of a Set of Compounds 
of the Formula R l . 36 -LYS(8-iNIP)-ANP-LYS(e-NH 2 >NH 2 

5 Figure 7 illustrates the parallel synthesis of a set of 36 T-L-X compounds 

(X = L h ), where L h is an amine (specifically, the e-arnino group of a lysine-derived 
moiety), L 2 is an ortho-nitrobenzylamine group with L 3 being a carboxamido- 
substituted alkyleneaminoacylalkylene group that links L h and L 2 , T has a modular 
structure wherein the carboxylic acid group of lysine has been joined to the nitrogen 

10 atom of the L 2 benzylamine group to form an amide bond, and a variable weight 
component R,_ 36 , (where these R groups correspond to T 2 as defined herein, and may be 
introduced via any of the specific carboxylic acids listed herein) is bonded through the 
a-amino group of the lysine, while a mass spec sensitivity enhancer group (introduced 
via N-methylisonipecotic acid) is bonded through the s-amino group of the lysine. 

1 5 Referring to Figure 7 : 

Step A . Fmoc-Lys(Boc)-SRAM Resin (available from ACT; compound I) is mixed 
with 25% piperidine in DMF and shaken for 5 min. The resin is filtered, then mixed 
with 25% piperidine in DMF and shaken for 10 min. The solvent is removed, the resin 
washed with NMP (2X), MeOH (2X), and DMF (2X), and used directly in step B. 

20 

SteoB . The resin (compound II), ANP (available from ACT; 3 eq.), HATU (3 eq.) and 
NMM (7.5 eq.) in DMF are added and the collection vessel shaken for 1 hr. The 
solvent is removed and the resin washed with NMP (2X), MeOH (2X), and DMF (2X). 
The coupling of I to the resin and the wash steps are repeated, to give compound III. 

25 

Steps C-J . The resin (compound III) is treated as in steps B-I in Example 7 to give 
compounds X,_ 36 . 
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has a modular structure wherein the a-carboxylic acid group of glutamatic acid has 
been joined to the nitrogen atom of the L 2 benzylamine group to form an amide bond, 
and a variable weight component R,.^, (where these R groups correspond to T 2 as 
defined herein, and may be introduced via any of the specific carboxylic acids listed 
5 herein) is bonded through the aa-amino group of the glutamic acid, while a mass spec 
sensitivity enhancer group (introduced via 2-(diisopropylamino)ethylamine) is bonded 
through the 7-carboxylic acid of the glutamic acid. 

Referring to Figure 6: 
Steps A-B . Same as in Example 7. 

10 

Step_C. The deprotected resin (compound III) is coupled to Fmoc-Glu-(OAl)-OH using 
the coupling method described in step C of Example 7 to give compound IV. 

Stgp_D. The allyl ester on the resin (compound IV) is washed with CH 3 C1 2 (2X) and 
1 5 mixed with a solution of (PPh 3 ) 4 Pd (0) (0.3 eq.) and N-methylaniline (3 eq.) in CH,C1 2 . 
The mixture is shaken for 1 nr. The solvent is removed and the resin is washed with 
CH 2 C1 2 (2X). The palladium step is repeated. The solvent is removed and the resin is. 
washed with CH 2 C1 2 (2X), N,N-diisopropylethylammonium diethyldithiocarbamate in 
DMF (2X), DMF (2X) to give compound V. 

20 

Ste P E- The deprotected resin from step D is suspended in DMF and activated by 
mixing HATU (3 eq.), and NMM (7.5 eq.). The vessels are shaken for 1 5 minutes. The 
solvent is removed and the resin washed with NMP (IX). The resin is mixed with 2- 
(diisopropylamino)ethylamine (3 eq.) and NMM (7.5 eq.). The vessels are shaken for 1 
25 hour. The coupling of 2-(diisopropylamino)ethylamine to the resin and the wash steps 
are repeated, to give compound VI. 



Steps F-J . Same as in Example 7. 
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ortho-nitrobenzylamine group with L 3 being a methylene group that links L h and L\ T 
has a modular structure wherein the carboxylic acid group of lysine has been joined to 
the nitrogen atom of the L 2 benzylamine group to form an amide bond, and a variable 
weight component R,. 36 , (where these R groups correspond to T 2 as defined herein, and 
5 may be introduced via any of the specific carboxylic acids listed herein) is bonded 
through the e-amino group of the lysine, while a mass spec sensitivity enhancer group 
(introduced via N-methylisonipecotic acid) is bonded through the a-amino group of the 
lysine. 

Referring to Figure 5: 
10 Steps A-C . Same as in Example 7. 

Step D . The resin (compound IV) is treated with piperidine as described in step B of 
Example 7 to remove the FMOC group. 

15 Step E . The deprotected a-amine on the resin in step D is coupled with N- 
methylisonipecotic acid as described in step C of Example 7 to give compound V. 



Step F . Same as in Example 7. 



20 Step G . The resin (compounds VI U36 ) are treated with palladium as described in step D 
of Example 7 to remove the Aloe group. 

Steps H-J . The compounds X». 36 are prepared in the same manner as in Example 7. 



25 EXAMPLE 10 

Preparation of a Set of Compounds 
of the Formula R 106 -Glu(y-DIAEA)-ANP-Tfp 

Figure 6 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
30 (X = L h ), where 1^, is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
ortho-nitrobenzylamine group with L 3 being a methylene group that links L h and L\ T 
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EtOAc, washed with 5% aq. NaHCO, (3X), dried over Na 2 S0 4 , filtered, and evaporated 
in vacuo, providing compounds X,_ 36 . 

EXAMPLE 8 
Preparation of a Set of Compounds 
of the Formula R uu -LYs(e-iNIP)-NBA-TFP 

Figure 4 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
ortho-nitrobenzylamine group with L 3 being a direct bond between L h and L 2 , where L h 
is joined directly to the aromatic ring of the L 2 group, T has a modular structure wherein 
the carboxylic acid group of lysine has been joined to the nitrogen atom of the L 2 
benzylamine group to form an amide bond, and a variable weight component R l06 , 
(where these R groups correspond to T 2 as defined herein, and may be introduced via 
any of the specific carboxylic acids listed herein) is bonded through the a-amino group 
of the lysine, while a mass spec enhancer group (introduced via N-methyiisonipecotic 
acid) is bonded through the e-amino group of the lysine. 

Referring to Figure 4 

Step A . NovaSyn HMP Resin is coupled with compound I (NBA prepared according 
to the procedure of Brown et aL, Molecular Diversity, 1, 4 (1995)) according to the 
procedure described in step A of Example 7, to give compound II. 

Steps B-J . The resin (compound II) is treated as described in steps B-J of Example 7 to 
give compounds X,. 36 . 

EXAMPLE 9 
Preparation of a Set of Compounds 
of the Formula iNIP-Lys (e-R^-ANP-Tfp 

Figure 5 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
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The palladium step is repeated. The solvent is removed and the resin is washed with 
CH,C1 2 (2X), N,N-diisopropylethylammonium diethyldithiocarbamate in DMF (2X), 
DMF (2X) to give compound V. 

5 StepE . The deprotected resin from step D is coupled with N-methylisonipecotic acid as 
described in step C to give compound VI. 

Step F . The Fmoc protected resin VI is divided equally by the ACT357 from the 
collection vessel into 36 reaction vessels to give compounds VI ,_ 36 . 

10 

Step G . The resin (compounds VI,_ 36 ) is treated with piperidine as described in step B to 
remove the FMOC group. 

Step H . The 36 aliquots of deprotected resin from step G are suspended in DMF. To 
1 5 each reaction vessel is added the appropriate carboxylic acid (R U36 C0 2 H; 3 eq.), HATU 
(3 eq.) ? and NMM (7.5 eq.) in DMF. The vessels are shaken for 1 hr. The solvent is 
removed and the aliquots of resin washed with NMP (2X), MeOH (2X), and DMF (2X). 
The coupling of R,. 36 C0 2 H to the aliquots of resin and the wash steps are repeated, to 
give compounds VIII^. 

20 

' Step L The aliquots of resin (compounds VIII N36 ) are washed with CH 2 Cl 2 (3X). To 
each of the reaction vessels is added 90:5:5 TFA:H20:CH 2 C1 2 and the vessels shaken 
for 120 min. The solvent is filtered from the reaction vessels into individual tubes. The 
aliquots of resin are washed with CH 2 C1 2 (2X) and MeOH (2X) and the filtrates 
25 combined into the individual tubes. The individual tubes are evaporated in vacuo, 
providing compounds IX,. 36 . 

Step J . Each of the free carboxylic acids IX I06 is dissolved in DMF. To each solution is 
added pyridine (1.05 eq.), followed by tetrafluorophenyl trifluoroacetate (1.1 eq.). The 
30 mixtures are stirred for 45 min. at room temperature. The solutions are diluted with 
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the nitrogen atom of the L 2 benzylamine group to form an amide bond, and a variable 
weight component R,. 36 , (where these R groups correspond to T 2 as defined herein, and 
may be introduced via any of the specific carboxylic acids listed herein) is bonded 
through the a-amino group of the lysine, while a mass spec sensitivity enhancer group 
5 (introduced via N-methylisonipecotic acid) is bonded through the e-amino group of the 
lysine. 

Referring to Figure 3: 

Step_A. NovaSyn HMP Resin (available from NovaBiochem; 1 eq.) is suspended with 
DMF in the collection vessel of the ACT357. Compound I (ANP available from ACT; 
10 3 eq.), HATU (3 eq.) and NMM (7.5 eq.) in DMF are added and the collection vessel 
shaken for 1 hr. The solvent is removed and the resin washed with NMP (2X), MeOH 
(2X), and DMF (2X). The coupling of I to the resin and the wash steps are repeated, to 
give compound II. 

15 Step B. The resin (compound II) is mixed with 25% piperidine in DMF and shaken for 
5 min. The resin is filtered, then mixed with 25% piperidine in DMF and shaken for 10 
min. The solvent is removed, the resin washed with NMP (2X), MeOH (2X), and DMF 
(2X), and used directly in step C. 

20 Step C. The deprotected resin from step B is suspended in DMF and to it is added an 
FMOC-protected amino acid, containing a protected amine functionality in its side 
chain (Fmoc-Lysine(Aloc)-OH, available from PerSeptive Biosystems; 3 eq.), HATU (3 
eq.), and NMM (7.5 eq.) in DMF. The vessel is shaken for 1 hr. The solvent is 
removed and the resin washed with NMP (2X), MeOH (2X), and DMF (2X). The 

25 coupling of Fmoc-Lys(Aloc)-OH to the resin and the wash steps are repeated, to give 
compound IV. 



30 



Step D. The resin (compound IV) is washed with CH 2 C1 2 (2X), and then suspended in a 
solution of (PPhjX.Pd (0) (0.3 eq.) and PhSiH, (10 eq.) in CH 2 C1 2 . The mixture is 
shaken for 1 hr. The solvent is removed and the resin is washed with CH 2 C1 2 (2X). 
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24 


12 


none 


54 


27 


none 


25 


12.5 


none 


55 


27.5 


none 


26 


13 


none 


56 


28 


none 


27 


13.5 


none 


57 


28.5 


none 


28 


14 


none 


58 


29 


none 


29 


14.5 


none 


59 


29.5 


none 


30 


15 


none 


60 


30 


none 



The temporal appearance of the tags for set #1 is 212.1, 200.1, 196.1, 
10 182.1, 235.2, 218.1, 199.1, 227.1, and the temporal appearance of tags for set #2 is 
199.1, 227.1, 179.1, 226.1, 209.1, 198.1. Since 212.1 amu indicates the 4- 
methoxybenzoic acid derivative, 200.1 indicates the 4-fluorobenzoic acid derivative, 
196.1 amu indicates the toluic acid derivative, 182.1 amu indicates the benzoic acid 
derivative, 235.2 amu indicates the indole-3 -acetic acid derivative, 218.1 amu indicates 
15 the 2,6-difluorobenzoic derivative, 199.1 amu indicates the nicotinic acid N-oxide 
derivative, 227.1 amu indicates the 2-nitrobenzamide, 179.18 amu indicates the 5- 
acetylsalicylic acid derivative, 226.1 amu indicates the 4-ethoxybenzoic acid derivative, 
209.1 amu indicates the cinnamic acid derivative, and 198.1 amu indicates the 3- 
aminonicotinic acid, the first sequence can be deduced as -5'-TATGCA-3'- and the 
20 second sequence can be deduced as -5-CGTACC-3-. Thus, it is possible to sequence 
' more than one DNA sample per separation step. 

EXAMPLE 7 

25 Preparation of a Set of Compounds 

of the Formula R M6 -Lys(b-iNIP)-ANP-Tfp 

Figure 3 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
30 ortho-nitrobenzylamine group with L 3 being a methylene group that links L h and L 2 , T 
has a modular structure wherein the carboxylic acid group of lysine has been joined to 
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The pellet is then tested by mass spectrometry for the presence of tags. The same 
MALDI technique is employed as described in Example 4. The following MWs (tags) 
are observed in the mass spectra as a function of time: 



5 


Fraction # 


Time 


MWs 


Fraction U 


Time 


MWs 




1 


0.5 


none 


31 


15.5 


212.1, 199.1 




2 


1.0 


none 


32 


16 


212.1, 199.1 




3 


1.5 


none 


33 


16.5 


212.1, 199.1 




4 


2.0 


none 


34 


17 


212.1; 200.1, 199.1, 


10 












227.1 




5 


2.5 


none 


35 


17.5 


200.1,199.1,227.1 




6 


3.0 


none 


36 


18 


200.1,227.1 




7 


3.5 


none 


37 


18.5 


200.1,227.1, 179.18 




8 


4.0 


none 


38 


19 


200.1; 196.1. 179.18 


15 


9 


4.5 


none 


39 


19.5 


200.1; 196.1, 179.18 




10 


5.0 


none 


40 


20 


196.1, 179.18, 226.1 




11 


5.5 


none 


41 


20.5 


196.1,226.1 




12 


6.0 


none 


42 


21 


196.1; 182.1,226.1 




13 


6.5 


none 


43 


21.5 


182.1,226.1,209.1 


20 


14 


7.0 


none 


44 


22 


182.1,209.1 




15 


7.5 


none 


45 


22.5 


182.1; 235.2, 209.1, 
198.1 




16 


8.0 


none 


46 


23 


235.2, 198.1 




17 


8.5 


none 


47 


23.5 


235.2, 198.1 


25 


18 


9.0 


none 


48 


24 


235.2;, 198.1,218.1 




19 


9.5 


none 


49 


24.5 


218.1 




20 


10.0 


none 


50 


25 


218.1 




21 


10.5 


none 


51 


25.5 


none 




22 


11 


none 


52 


26 


none 


30 


23 


11.5 


none 


53 


26.5 


none 



WO 97/27331 



115 



PCTYUS97/01304 



120 minutes. The final reaction conditions consist of 0.15 M sodium borate at pH 8.3, 2 
mg/ml recrystallized cyanuric chloride and 500 ug/ml respective oligonucleotide. The 
unreacted cyanuric chloride is removed by size exclusion chromatography on a G-50 
Sephadex column. 

5 The activated purified oligonucleotide is then reacted with a 100- fold 

molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with a particular 
pentafluorophenyl-ester of the following: (1) DM0767 with 4-methoxybenzoic acid and 
10 DM0773 with nicotinic acid N-oxide, (2) DM0768 with 4-fluorobenzoic acid and 
DM0774 with 2-nitrobenzoic acid, (3) toluic acid and DM0775 with acetylsalicylic 
acid, (4)DM0769 with benzoic acid and DM0776 with 4-6thoxybenzoic acid, 
(5) DMO770 with indole-3-acetic acid and DMO 777 with cinhamic acid, (6) DM0771 
with 2,6-difluorobenzoic acid and DM0778 with 3-aminonicotinic acid. Therefore, 
1 5 there is one of tags for each set of ODNs. 

10 ng of each of the 12 derived ODNs are mixed together and then size 
separated by HPLC. The mixture is placed in 25 \x\ of distilled water. The entire 
sample is injected on to the following column. A LiChrospher 4000 DMAE, 50-10 mm 
column is used (EM Separations, Wakefield, RI). Eluent A is 20 mM Na 2 HP0 4 in 20% 
20 ACN, pH7.4; Eluent B is Eluent A + 1 M NaCl, pH7.4. The flowrate is for 1 ml/min 
7 and detection is UV @ 280 nm. The gradient is as follows: 0 min. @ 100% A and 0% 
B, 3 min. @ 100% A and 0% B, 15 min. @ 80% A and 20% B, 60 min. @ 0% A and 
100% B, 63 min. @ 0% A and 100% B, 65 min. @ 100% A and 0% B, 70 min. @ 
100%Aand0%B. Fractions are collected at 0.5 minute intervals. 
25 To cleave the tags from the oligonucleotide, 100 |il of 0.05 M 

dithiothreitol (DTT) is added to each fraction. Incubation is for 30 minutes at room 
temperature. NaCl is then added to 0.1 M and 2 volumes of EtOH is added to 
precipitate the ODNs. The ODNs are removed from solution by centrifugation at 
14,000 x G at 4°C for 15 minutes. The supernatents are reserved, dried to completeness 
30 under a vacuum with centrifugation. The pellets are then dissolved in 25 MeOH. 
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toluic acid derivative, 182.1 amu indicates the benzoic acid derivative, 235.2 amu 
indicates the indole-3-acetic acid derivative, 218.1 amu indicates the 2,6- 
difluorobenzoic derivative, 199.1 amu indicates the nicotinic acid N-oxide derivative, 
227.1 amu indicates the 2-nitrobenzamide, the sequence can be deduced as -5'- 
5 ATGCATG-3'-. 



EXAMPLE 6 

Demonstration of Sequencing of Two DNA Samples in a 
10 Single HPLC Separation Method 

In this example, two DNA samples are sequenced in a single separation 

method. 

The following oligonucleotides are prepared as described in Example 1 : 

15 

DMO 767: '5-hexylamine-TGTAAAACGACGGCCAGT-3' (Seq. ID No. 1) 
DMO 768: '5-hexylamine-TGTAAAACGACGGCCAGTA-3' (Seq. ID No. 2) 
DMO 769: , 5-hexylamine-TGTAAAACGACGGCCAGTAT-3* (Seq. ID No. 3) 
DMO 770: '5-hexylamine-TGTAAAACGACGGCCAGTATG-3' (Seq. ID No. 4) 

20 DMO 77 1 : '5-hexylamine-TGTAAAACGACGGCCAGTATGC-3' (Seq. ID No. 5) 
, DMO 772: '5-hexylamine-TGTAAAACGACGGCCAGTATGCA-3' (Seq. ID No. 6) 
DMO 775: , 5-hexylamine-TGTAAAACGACGGCCAGC-3' (Seq. ID No. 9) 
DMO 776: 'S-hexylamine-TGTAAAACGACGGCCAGCG-S' (Seq. ID No. 10) 
DMO 777: '5-hexylamine-TGTAAAACGACGGCCAGCGT-3' (Seq. ID No. 11) 

25 DMO 778: '5-hexylamine-TGTAAAACGACGGCCAGCGTA-3' (Seq. ID No. 12) 
DMO 779: '5-hexylamine-TGTAAAACGACGGCCAGCGTAC-3' (Seq. ID No. 13) 
DMO 780: '5-hexylamine-TGTAAAACGACGGCCAGCGTACC-3' (Seq. ID No. 14) 

100 ng of each of the 5'-terminal amine-linked oligonucleotides 
30 described above are reacted with an excess recrystallized cyanuric chloride in 10% n- 
. methyl-pyrrolidone alkaline (pH 8.3 to 8.5 preferably) buffer at 19°C to 25°C for 30 to 
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none 


42 
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196.1; 182.1 
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none 


43 


21.5 


182.1 




14 


7.0 


none 


44 


22 


182.1 




15 


7.5 


none 


45 


22.5 


182.1; 235.2 




16 


8.0 


none 


46 


23 


235.2 




I / 


8.5 


none 


47 


23.5 


235.2 




f o 


9.0 


none 


48 


24 


235.2; 218.1 
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9.5 


none 


49 


24.5 


218.1 




20 


10.0 


none 


50 


25 


218.1 
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10.5 


none 


51 


25.5 


218.1; 199.1 
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none 


52 


26 


199.1 




2J 


11.5 


none 


53 


26.5 


199.1; 227.1 


Zi) 


24 


12 


none 


54 


27 


227.1 




2!) 


12.5 


none 


55 


27.5 


227.1 




26 


13 


none 


56 


28 


none 




27 


13.5 


none 


57 


28.5 


none 




28 


14 


none 


58 


29 


none 


!5 


29 


14.5 


none 


59 


29.5 


none 




30 


15 


none 


60 


30 


none 



The temporal appearance of the tags i s thus 212.1, 200.1, 196 I 182 1 
235.2, 218.!, ,99.1, 227... Since 212.1 amu indicates the ^thoxybenzoic acid 
;0 derivative. 200.1 Mca.es the 4-fluorobenzoic acid derivative. 1 96.1 antu indicates the 
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temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with a particular 
pentafluorophenyl-ester of the following: (1)DM0767 with 4-methoxybenzoic acid, 
(2) DM0768 with 4-fluorobenzoic acid, (3) toluic acid, (4) DM0769 with benzoic acid, 
5 (5)DMO770 with indole-3-acetic acid, (6)DM0771 with 2,6-difluorobenzoic acid, 
(7) DM0772 with nicotinic acid N-oxide, (8) DM0773 with 2-nitrobenzoic acid. 

1 0 ng of each of the eight derived ODNs are mixed together and then 
size separated by HPLC. The mixture is placed in 25 \il of distilled water. The entire 
sample is injected on to the following column. A LiChrospher 4000 DMAE, 50-10 mm 

10 column is used (EM Separations, Wakefield, RI). Eluent A is 20 mM Na 2 HP0 4 in 20% 
ACN, pH7.4; Eluent B is Eluent A+IM NaCl, pH7.4. The flowrate is for 1 ml/min 
and detection is UV @ 280 nm. The gradient is as follows: 0 min. @ 100% A and 0% 
B, 3 min. @ 100% A and 0% B, 15 min. @ 80% A and 20% B, 60 min. @ 0% A and 
100% B, 63 min. @ 0% A and 100% B, 65 min. @ 100% A and 0% B, 70 min. @ 

1 5 100% A and 0% B. Fractions are collected at 0.5 minute intervals. 

To cleave the tags from the oligonucleotide, 100 nl of 0.05 M 
dithiothreitol (DTT) is added to each fraction. Incubation is for 30 minutes at room 
temperature. NaCl is then added to 0.1 M and 2 volumes of EtOH is added to 
precipitate the ODNs. The ODNs are removed from solution by centrifiigation at 

20 14,000 x G at 4°C for 1 5 minutes. The supernatents are reserved, dried to completeness 
under a vacuum with centrifiigation. The pellets are then dissolved in 25 ^1 MeOH. 
The pellet is then tested by mass spectrometry for the presence of MW-identifiers. The 
same MALDI technique is employed as described in Example 4. The following MWs 
(tags) are observed in the mass spectra as a function of time: 

25 
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ethoxybenzoic acid derivative, (11)209.1 amu indicating cinnamic acid derivative, 
(12) 198.1 amu indicating 3-aminonicotinic acid derivative. 

The results indicate that the MW-identifiers are cleaved from the primers 
and are detectable by mass spectrometry. 

5 

EXAMPLE 5 

Demonstration of Sequencing Using an HPLC Separation Method, 
Collecting Fractions, Cleaving the MW Identifiers, Determining the 
1 0 Mass (and thus the Identity) of the M W-Identifier and then 

Deducing the Sequence 

The following oligonucleotides are prepared as described in Example 4: 

1 5 DMO 767: '5-hexylamine-TGTAAAACGACGGCCAGT-3' (Seq. ID No. 1 ) 
DMO 768: '5-hexylamine-TGTAAAACGACGGCCAGTA-3' (Seq. ID No. 2) 
DMO 769: '5-hexylamine-TGTAAAACGACGGCCAGTAT-3' (Seq. ID No. 3) 
DMO 770: '5-hexylamine-TGTAAAACGACGGCCAGTATG-3' (Seq. ID No. 4) 
DMO 771: , 5-hexylamine-TGTAAAACGACGGCCAGTATGC-3' (Seq. ID No. 5) 

20 DMO 772: 'S-hexylamine-TOTAAAACGACGGCCAGTATGCA-S* (Seq. ID No. 6) 
DMO 773: 'S-hexylamine-TGTAAAACGACGGCCAGTATGCAT-S' (Seq. ID No. 7) 
* DMO 774: 'S-hexylamine-TGTAAAACGACGGCCAGTATGCATG-S' (Seq. ID No. 8) 

100 jig of each of the 5-terminal amine-linked oligonucleotides 
25 described above are reacted with an excess recrystallized cyanuric chloride in 10% n- 
methyl-pyrrolidone alkaline (pH 8.3 to 8.5 preferably) buffer at 19°C to 25°C for 30 to 
120 minutes. The final reaction conditions consist of 0. 1 5 M sodium borate at pH 8.3, 2 
mg/ml recrystallized cyanuric chloride and 500 ug/ml respective oligonucleotide. The 
unreacted cyanuric chloride is removed by size exclusion chromatography on a G-50 
30 Sephadex column. 

The activated purified oligonucleotide is then reacted with a 100-fold 
molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
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molar excess) with the pentafluorophenyl-esters of either: (1) 4-methoxybenzoic acid, 
(2) 4-fluorobenzoic acid, (3) toluic acid, (4) benzoic acid, (5) indole-3-acetic acid, 
(6) 2,6-difluorobenzoic acid, (7) nicotinic acid N -oxide, (8) 2-nitrobenzoic acid, (9) 5- 
acetylsalicylic acid, (10) 4-ethoxy benzoic acid, (ll)cinnamic acid, (12)3- 
5 aminonicotinic acid. The reaction is for 2 hours at 37°C in 0.2 M NaBorate pH 8.3. 
The derived ODNs are purified by gel exclusion chromatography on G-50 Sephadex. 

To cleave the tag from the oligonucleotide, the ODNs are adjusted to 1 x 
10" 5 molar and then dilutions are made (12, 3-fold dilutions) in TE (TE is 0.01 M Tris, 
pH 7.0, 5 mM EDTA) with 50% EtOH (V/V). To 100 *il volumes of ODNs 25 ^il of 

10 0.01 M dithiothreitol (DTT) is added. To an identical set of controls no DDT is added. 
Incubation is for 30 minutes at room temperature. NaCl is then added to 0.1 M and 2 
volumes of EtOH is added to precipitate the ODNs. The ODNs are removed from 
solution by centrifugation at 14,000 x G at 4°C for 15 minutes. The supernatants are 
reserved, dried to completeness. The pellet is then dissolved in 25 fil MeOH. The 

1 5 pellet is then tested by mass spectrometry for the presence of tags. 

The mass spectrometer used in this work is an external ion source 
Fourier-transform mass spectrometer (FTMS). Samples prepared for MALDI analysis 
are deposited on the tip of a direct probe and inserted into the ion source. When the 
sample is irradiated with a laser pulse, ions are extracted from the source and passed 

20 into a long quadrupole ion guide that focuses and transports them to an FTMS analyzer 
cell located inside the bore of a superconducting magnet. 

The spectra yield the following information. Peaks varying in intensity 
from 25 to 100 relative intensity units at the following molecular weights: (1)212.1 
amu indicating 4-methoxybenzoic acid derivative, (2) 200.1 indicating 4-fluorobenzoic 

25 acid derivative, (3) 196.1 amu indicating toluic acid derivative, (4) 182.1 amu indicating 
benzoic acid derivative, (5)235.2 amu indicating indole-3-acetic acid derivative, 
(6)218.1 amu indicating 2,6-difluorobenzoic derivative, (7)199.1 amu indicating 
nicotinic acid N-oxide derivative, (8)227.1 amu indicating 2-nitrobenzamide, 
(9) 179.18 amu indicating 5-acetylsalicylic acid derivative, (10) 226.1 amu indicating 4- 
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Moles of 


RFU 


RFU 


RFU 


Fluorochrome 


non-cleaved 


cleaved 


free 


1.0 x 10 5 M 


6.4 


1200 


1345 


3.3 x 10*M 


2.4 


451 


456 


J .1 X I U JVl 


U.V 


1 


1 JU 


3.7 x 10 7 M 


0.3 


44 


48 


1.2 x 10 7 M 


0.12 


15.3 


16.0 


4.1 x 10 7 M 


0.14 


4.9 


5.1 


1.4 x 10 8 M 


0.13 


2.5 


2.8 


4.5 x lOlVl 


0.12 


0.8 


0.9 



The data indicate that there is about a 200-fold increase in relative fluorescence when 
the fluorochrome is cleaved from the ODN. 

5 

EXAMPLE 4 
Preparation of Tagged M13 Sequence Primers 
and Demonstration of Cleavage of Tags 

10 Preparation of 2,4,6-trichlorotriazine derived oligonucleotides: 1000 |ig 

of S'-terminal amine linked oligonucleotide (S'-hexylamine- 

TGTAAAACGACGGCC AGT-3 M ) (Seq. ID No. 1) are reacted with an excess 
' recrystallized cyanuric chloride in 10% n-methyl-pyrrolidone alkaline (pH 8.3 to 8.5 

preferably) buffer at 19 to 25- C for 30 to 120 minutes. The final reaction conditions 
1 5 consist of 0.15 M sodium borate at pH 8.3, 2 mg/ml recrystallized cyanuric chloride and 

500 ug/ml respective oligonucleotide. The unreacted cyanuric chloride is removed by 

size exclusion chromatography on a G-50 Sephadex column. 

The activated purified oligonucleotide is then reacted with a 100-fold 

molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
20 temperature- The unreacted cystamine is removed by size exclusion chromatography on 

a G-50 Sephadex column. The derived ODNs are then reacted with a variety of amides. 

The derived ODN preparation is divided into 12 portions and each portion is reacted (25 
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500 ug/ml respective oligonucleotide. The unreacted cyanuric chloride is removed by 
size exclusion chromatography on a G-50 Sephadex (Pharmacia, Piscataway, NJ) 
column. 

The activated purified oligonucleotide is then reacted with a 100-fold 
5 molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with amine-reactive 
fluorochromes. The derived ODN preparation is divided into 3 portions and each 
portion is reacted with either (a) 20-fold molar excess of Texas Red sulfonyl chloride 

10 (Molecular Probes, Eugene, OR), with (b) 20-fold molar excess of Lissamine sulfonyl 
chloride (Molecular Probes, Eugene, OR), (c) 20-fold molar excess of fluorescein 
isothiocyanate. The final reaction conditions consist of 0.15 M sodium borate at pH 8.3 
for 1 hour at room temperature. The unreacted fluorochromes are removed by size 
exclusion chromatography on a G-50 Sephadex column. 

15 To cleave the fluorochrome from the oligonucleotide, the ODNs are 

adjusted to 1 x 10' 5 molar and then dilutions are made (12, 3-fold dilutions) in TE (TE is 
0.01 M Tris, pH 7.0, 5 mM EDTA). To 100 yi\ volumes of ODNs 25 ixl of 0.01 M 
dithiothreitol (DTT) is added. To an identical set of controls no DDT is added. The 
mixture is incubated for 1 5 minutes at room temperature. Fluorescence is measured in a 

20 black microtiter plate. The solution is removed from the incubation tubes (150 
microliters) and placed in a black microtiter plate (Dynatek Laboratories, Chantilly, 
VA). The plates are then read directly using a Fluoroskan II fluorometer (Flow 
Laboratories, McLean, VA) using an excitation wavelength of 495 nm and monitoring 
emission at 520 nm for fluorescein, using an excitation wavelength of 591 nm and 

25 monitoring emission at 612 nm for Texas Red, and using an excitation wavelength of 
570 nm and monitoring emission at 590 nm for lissamine. 
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New England Ultraviolet Co., Middletown, CT) with an emission peak at 350 nm is 
used as a source of UV light. The lamp is placed at a 15-cm distance from the Petri 
dishes with samples. SDS gel electrophoresis shows that >85% of the conjugate is 
cleaved under these conditions. 

EXAMPLE 3 

Preparation of Fluorescent Labeled Primers and 
Demonstration of Cleavage of Fluorophore 



Synthesis anH Purification of Olip on ncleotides 

The oligonucleotides (ODNs) are prepared on automated DNA 
synthesizers using the standard phosphoramidite chemistry supplied by the vendor, or 
the H-phosphonate chemistry (Glenn Research Sterling, VA). Appropriately blocked 
15 dA, dG, dC, and T phosphoramidites are commercially available in these forms, and 
synthetic nucleosides may readily be converted to the appropriate form. The 
oligonucleotides are prepared using the standard phosphoramidite supplied by the 
vendor, or the H-phosphonate chemistry. Oligonucleotides are purified by adaptations 
of standard methods. Oligonucleotides with 5'-trityl groups are chromatographed on 
20 HPLC using a 12 micrometer, 300 # Rainin (Emeryville, CA) Dynamax C-8 4.2x250 
mm reverse phase column using a gradient of 15% to 55% MeCN in 0.1 N 
Et 3 NH + OAc", P H 7.0, over 20 min. When detritylation is performed, the 
oligonucleotides are further purified by gel exclusion chromatography. Analytical 
checks for the quality of the oligonucleotides are conducted with a PRP-column 
25 (Alltech, Deerfield, IL) at alkaline pH and by PAGE. 

Preparation of 2,4,6-trichlorotriazine derived oligonucleotides: 10 to 
1000 ug of 5'-terminal amine linked oligonucleotide are reacted with an excess 
recrystallized cyanuric chloride in 10% n-methyl-pyrrolidone in alkaline (pH 8.3 to 8.5 
preferably) buffer at 19°C to 25°C for 30 to 120 minutes. The final reaction conditions 
30 consist of 0.15 M sodium borate at pH 8.3, 2 mg/ml recrystallized cyanuric chloride and 



WO 97/27331 



PCT/US97/01304 



106 

DMF (2X). The deprotected resin is then divided equally by the ACT357 from the 
collection vessel into 16 reaction vessels. 

Step F . The 16 aliquots of deprotected resin from step E are suspended in DMF. To 
5 each reaction vessel is added the appropriate carboxylic acid VII M6 (R M6 C0 2 H; 3 eq.), 
HATU (3 eq.), and DIEA (7.5 eq.) in DMF. The vessels are shaken for 1 hr. The 
solvent is removed and the aliquots of resin washed with NMP (2X), MeOH (2X), and 
DMF (2X). The coupling of VII M6 to the aliquots of resin and the wash steps are 
repeated, to give compounds VIII M6 . 

10 

Step G . The aliquots of resin (compounds VIII M6 ) are washed with CH 2 C1 2 (3X). To 
each of the reaction vessels is added 1% TFA in CH 2 C1 2 and the vessels shaken for 30 
min. The solvent is filtered from the reaction vessels into individual tubes. The 
aliquots of resin are washed with CH 2 Cl 2 (2X) and MeOH (2X) and the filtrates. 
15 combined into the individual tubes. The individual tubes are evaporated in vacuo, 
providing compounds IX M6 . 

Step H . Each of the free carboxylic acids IX M6 is dissolved in DMF. To each solution 
is added pyridine (1.05 eq.), followed by pentafluorophenyl trifluoroacetate (1.1 eq.): 
20 The mixtures are stirred for 45 min. at room temperature. The solutions are diluted with 
EtOAc, washed with 1 M aq. citric acid (3X) and 5% aq. NaHC0 3 (3X), dried over 
Na 2 S0 4 , filtered, and evaporated in vacuo, providing compounds X M6 . 

25 EXAMPLE 2 

Demonstration of Photolytic Cleavage 
ofT-L-X 

A T-L-X compound as prepared in Example 13 was irradiated with near- 
30 UV light for 7 min at room temperature. A Rayonett fluorescence UV lamp (Southern 



3NSDOCID: <WO 9727331 A2_l_: 



WO 97/27331 



105 



PCT/US97/01304 



mixture is diluted with EtOAc, washed with 1 N HG1 (2X), pH 9.5 carbonate buffer 
(2X), and brine (IX), dried over Na 2 S0 4 , and evaporated in vacuo to give the allyl ester 
of compound I. 

5 Step B . The allyl ester of compound I from step A (1.75 eq.) is combined in CH 2 C1 2 
with an FMOC-protected amino acid containing amine functionality in its side chain 
(compound II, e.g. alpha-N-FMOC-3-(3-pyridyl)-alanine, available from Synthetech, 
Albany, OR; T eq.), N-methylmorpholine (2.5 eq.), and HATU (1.1 eq.), and stirred at 
room temperature for 4 hr. The mixture is diluted with CH 2 C1 2 , washed with 1 M aq. 

10 citric acid (2X), water (IX), and 5% aq. NaHC0 3 (2X), dried over Na 2 S0 4 , and 
evaporated in vacuo. Compound III is isolated by flash chromatography (CH 2 C1 2 ~ > 
EtOAc). 

StenC . Compound III is dissolved in CH 2 C1 2 , Pd(PPh 3 ) 4 (0.07 eq.) and N-methylaniline 
15 (2 eq.) are added, and the mixture stirred at room temperature for 4 hr. The mixture is 
diluted with CH 2 C1 2 , washed with 1 M aq. citric acid (2X) and water (IX), dried over 
Na2S0 4 , and-evaporated in vacuo. Compound IV is isolated by flash chromatography 
(CH 2 Cl 2 --> EtOAc + HOAc). 

20 Step D . TentaGel S AC resin (compound V; 1 eq.) is suspended with DMF in the 
collection vessel of the ACT357 peptide synthesizer (Advanced ChemTech Inc. (ACT), 
Louisville, KY). Compound IV (3 eq.), HATU (3 eq.) and DIEA (7.5 eq.) in DMF are 
added and the collection vessel shaken for 1 hr. The solvent is removed and the resin 
washed with NMP (2X), MeOH (2X), and DMF (2X). The coupling of IV to the resin 

25 and the wash steps are repeated, to give compound VI. 

Step E . The resin (compound VI) is mixed with 25% piperidine in DMF and shaken for 
5 min. The resin is filtered, then mixed with 25% piperidine in DMF and shaken for 10 
min. The solvent is removed and the resin washed with NMP (2X), MeOH (2X), and 
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Step D . The resin (compound V) is treated with piperidine as described in step B to 
remove the FMOC group. The deprotected resin is then divided equally by the ACT357 
from the collection vessel into 16 reaction vessels. 

5 Step E . The 16 aliquots of deprotected resin from step D are suspended in DMF. To 
each reaction vessel is added the appropriate carboxylic acid VI M6 (R,. l6 C0 2 H; 3 eq.), 
HATU (3 eq.), and DIEA (7.5 eq.) in DMF. The vessels are shaken for 1 hr. The 
solvent is removed and the aliquots of resin washed with NMP (2X), MeOH (2X), and 
DMF (2X). The coupling of VI M6 to the aliquots of resin and the wash steps are 
1 0 repeated, to give compounds VII,„ 16 . 

Step F . The aliquots of resin (compounds VII M6 ) are washed with CH 2 C1 2 (3X). To 
each of the reaction vessels is added 1% TFA in CH 2 C1 2 and the vessels shaken for 30 
min. The solvent is filtered from the reaction vessels into individual tubes. The 
15 aliquots of resin are washed with CH 2 C1 2 (2X) and MeOH (2X) and the filtrates 
combined into the individual tubes. The individual tubes are evaporated in vacuo, 
providing compounds VIII,. l6 . 

Step G . Each of the free carboxylic acids VIII M6 is dissolved in DMF; To each 
20 solution is added pyridine (1.05 eq.), followed by pentafluorophenyl trifluoroacetate 
(1.1 eq.). The mixtures are stirred for 45 min. at room temperature. The solutions are 
diluted with EtOAc, washed with 1 M aq. citric acid (3X) and 5% aq. NaHCO, (3X), 
dried over Na 2 S0 4 , filtered, and evaporated in vacuo, providing compounds IX|_t 6 . 

25 B. Synthesis of Pentafluorophenyl Esters of Chemically Cleavable Mass 
Spectroscopy Taps, to Liberate Tags with Carboxvl Acid Termini 
Figure 2 shows the reaction scheme. 

Step A . 4-(Hydroxymethyl)phenoxybutyric acid (compound I; 1 eq.) is combined with 
30 DIEA (2.1 eq.) and allyl bromide (2.1 eq.) in CHC1 3 and heated to reflux for 2 hr. The 
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EXAMPLES 



5 EXAMPLE 1 

— — — — _ — 

Preparation of Acid Labile Linkers for Use in 
Cleavable-MW-Identifier Sequencing 

A. Synthesis of Pentafluorophenyl Esters of Chemically Cleavable Mass 
10 Spectroscopy Taps, to Liberate Tags with Carboxvl Amide Termini 

Figure 1 shows the reaction scheme. 

Step A . TentaGel S AC resin (compound II; available from ACT; 1 eq.) is suspended 
with DMF in the collection vessel of the ACT357 peptide synthesizer (ACT). 
15 Compound I (3 eq.), HATU (3 eq.) and DIEA (7.5 eq.) in DMF are added and the 
collection vessel shaken for 1 hr. The solvent is removed and the resin washed with 
NMP (2X), MeOH (2X), and DMF (2X). The coupling of I to the resin and the wash 
steps are repeated, to give compound III. 

20 Step B . The resin (compound III) is mixed with 25% piperidine in DMF and shaken for 
5 min. The resin is filtered, then mixed with 25% piperidine in DMF and shaken for 1 0 
min. The solvent is removed, the resin washed with NMP (2X), MeOH (2X), and DMF 
(2X), and used directly in step C. 



25 Step C. The deprotected resin from step B is suspended in DMF and to it is added an 
FMOC-protected amino acid, containing amine functionality in its side chain 
(compound IV, e.g. alpha-N-FMOC-3^3-pyridyl)-alanine, available from Synthetech, 
Albany, OR; 3 eq.), HATU (3 eq.), and DIEA (7.5 eq.) in DMF. The vessel is shaken 
for 1 hr. The solvent is removed and the resin washed with NMP (2X), MeOH (2X), 

30 and DMF (2X). The coupling of IV to the resin and the wash steps are repeated, to give 
compound V. 
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ACT357 = ACT357 peptide synthesizer from Advanced ChemTech, Inc., Louisville, 
KY 

ACT - Advanced ChemTech, Inc., Louisville, KY 

NovaBiochem = CalBiochem-NovaBiochem International, San Diego, CA 
5 TFA = Trifluoroacetic acid 
Tfa = Trifluoroacetyl 
iNIP = N-Methylisonipecotic acid 
Tfp = Tetrafluorophenyl 
DIAEA = 2-(Diisopropylamino)ethylamine 
1 0 MCT = monochlorotriazene 

S'-AH-ODN = 5'-aminohexyl-tailed oligodeoxy nucleotide 
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particular unique tag. The tags map to either a sample type, a dideoxy terminator type, 
(in the case of a Sanger sequencing reaction) or preferably both. Specifically, the tag 
maps to a primer type which in turn maps to a vector type which in turn maps to a 
sample identity. The tag may also may map to a dideoxy terminator type (ddTTP, 

5 ddCTP, ddGTP, ddATP) by reference into which dideoxynucleotide reaction the tagged 
primer is placed. The sequencing reaction is then performed and the resulting fragments 
are sequentially separated by size in time. 

The tags are cleaved from the fragments in a temporal frame and 
measured and recorded in a temporal frame. The sequence is constructed by comparing 

10 the tag map to the temporal frame. That is, all tag identities are recorded in time after 
the sizing step and related become related to one another in a temporal frame. The 
sizing step separates the nucleic acid fragments by a one nucleotide increment and 
hence the related tag identities are separated by a one nucleotide increment. By 
foreknowledge of the dideoxy-terminator or nucleotide map and sample type, the 

1 5 sequence is readily deduced in a linear fashion. 

The following examples are offered by way of illustration, and not by 
way of limitation. 

Unless otherwise stated, chemicals as used in the examples may be 
20 obtained from Aldrich Chemical Company, Milwaukee, WI. The following 
abbreviations, with the indicated meanings, are used herein: 
ANP = 3-(Fmoc-amino)-3-(2-nitrophenyl)propionic acid 
NBA = 4-(Fmoc-aminomethyl)-3-nitrobenzoic acid 

HATU = O-T-azabenzotriazol-l-yl-N^^'^^tetramethyluronium hexafluoro- 
25 phosphate 

DIEA = diisopropylethylamine 
MCT = monochlorotriazine 
NMM = 4-methylmorpholine 
NMP = N-methylpyrrolidone 
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The following is a list of representative vendors for separation and 
detection technologies which may be used in the present invention. Hoefer Scientific 
Instruments (San Francisco, CA) manufactures electrophoresis equipment (Two Step™, 
Poker Face™ II) for sequencing applications. Pharmacia Biotech (Piscataway, NJ) 
5 manufactures electrophoresis equipment for DNA separations and sequencing 
(PhastSystem for PCR-SSCP analysis, MacroPhor System for DNA sequencing). 
Perkin Elmer/Applied Biosystems Division (ABI, Foster City, CA) manufactures semi- 
automated sequencers based on fluorescent-dyes (ABI373 and ABI377). Analytical 
Spectral Devices (Boulder, CO) manufactures UV spectrometers. Hitachi Instruments 

10 (Tokyo, Japan) manufactures Atomic Absorption spectrometers, Fluorescence 
spectrometers, LC and GC Mass Spectrometers, NMR spectrometers, and UV-VIS 
Spectrometers. PerSeptive Biosystems (Framingham, MA) produces Mass 
Spectrometers (Voyager™ Elite). Bruker Instruments Inc. (Manning Park, MA) 
manufactures FTIR Spectrometers (Vector 22), FT-Raman Spectrometers, Time of 

1 5 Flight Mass Spectrometers (Reflex II™), Ion Trap Mass Spectrometer (Esquire™) and 
a Maldi Mass Spectrometer. Analytical Technology Inc. (ATI, Boston, MA) makes 
Capillary Gel Electrophoresis units, UV detectors, and Diode Array Detectors. 
Teledyne Electronic Technologies (Mountain View, CA) manufactures an Ion Trap 
Mass Spectrometer (3DQ Discovery™ and the 3DQ Apogee™). Perkin Elmer/Applied 

20 Biosystems Division (Foster City, CA) manufactures a Sciex Mass Spectrometer (triple 
quadrupole LC/MS/MS, the API 100/300) which is compatible with electrospray. 
Hewlett-Packard (Santa Clara, CA) produces Mass Selective Detectors (HP 5972A), 
MALDI-TOF Mass Spectrometers (HP G2025A), Diode Array Detectors, CE units, 
HPLC units (HP 1090) as well as UV Spectrometers. Finnigan Corporation (San Jose, 

25 CA) manufactures mass spectrometers (magnetic sector (MAT 95 S™), quadrapole 
spectrometers (MAT 95 SQ™) and four other related mass spectrometers). Rainin 
(Emeryville, CA) manufactures HPLC instruments. 

The methods and compositions described herein permit the use of 
cleaved tags to serve as maps to particular sample type and nucleotide identity. At the 

30 beginning of each sequencing method, a particular (selected) primer is assigned a 
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methodology and instrumentation with the detection methodology and instrumentation 
forming a single device. For example, an interface is interposed between a separation 
technique and detection by mass spectrometry or potentiostatic amperometry . 

The function of the interface is primarily the release of the (e.g., mass 
5 spectrometry) tag from analyte. There are several representative implementations of the 
interface. The design of the interface is dependent on the choice of cleavable linkers. 
In the case of light or photo-cleavable linkers, an energy or,photon source is required. In 
the case of an acid-labile linker, a base-labile linker, or a disulfide linker, reagent 
addition is required within the interface. In the case of heat-labile linkers, an energy 
10 heat source is required. Enzyme addition is required for an enzyme-sensitive linker 
such as a specific protease and a peptide linker, a nuclease and a DNA or RN A linker, a 
glycosylase, HRP or phosphatase and a linker which is unstable after cleavage (e.g., 
similiar to chemiluminescent substrates). Other characteristics of the interface include 
minimal band broadening, separation of DNA from tags before injection into a mass 
15 spectrometer. Separation techniques include those based on electrophoretic methods 
and techniques, affinity techniques, size retention (dialysis), filtration and the like. 

It is also possible to concentrate the tags (or nucleic acid-linker-tag 
construct), capture electrophoretically, and then release into alternate reagent stream 
which is compatible with the particular type of ionization method selected. The 
20 interface may also be capable of capturing the tags (or nucleic acid-linker-tag construct) 
on microbeads, shooting the bead(s) into chamber and then preforming laser 
desorption/vaporization. Also it is possible to extract in flow into alternate buffer (e.g., 
from capillary electrophoresis buffer into hydrophobic buffer across a permeable 
membrane). It may also be desirable in some uses to deliver tags into the mass 
25 spectrometer intermittently which would comprise a further function of the interface. 
Another function of the interface is to deliver tags from multiple columns into a mass 
spectrometer, with a rotating time slot for each column. Also, it is possible to deliver 
tags from a single column into multiple MS detectors, separated by time, collect each 
set of tags for a few milliseconds, and then deliver to a mass spectrometer. 
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method is that of use of an electric field wherein DNA fragments are collected onto or 
near an electrode. A second method is that wherein the DNA fragments are collected by 
flowing a stream of liquid past the bottom of a gel. Aspects of both methods can be 
combined wherein DNA collected into a flowing stream which can be later concentrated 
5 by use of an electric field. The end result is that DNA fragments are removed from the 
milieu under which the separation method was performed. That is, DNA fragments can 
be "dragged" from one solution type to another by use of an electric field. 

Once the DNA fragments are in the appropriate solution (compatible 
with electrospray and mass spectrometry) the tag can be cleaved from the DNA 
10 fragment. The DNA fragment (or remnants thereof) can then be separated from the tag 
by the application of an electric field (preferably, the tag is of opposite charge of that of 
the DNA tag). The tag is then introduced into the electrospray device by the use of an 
electric field or a flowing liquid. 

Fluorescent tags can be identified and quantitated most directly by their 
1 5 absorption and fluorescence emission wavelengths and intensities. 

While a conventional spectrofluorometer is extremely flexible, providing 
continuous ranges of excitation and emission wavelengths (1 EX , 1 SI , 1 S2 ), more specialized 
instruments such as flow cytometers and laser-scanning microscopes require probes that 
are excitable at a single fixed wavelength. In contemporary instruments, this is usually 
20 the 488-nm line of the argon laser. 

... Fluorescence intensity per probe molecule is proportional to the product 
of e and QY. The range of these parameters among fluorophores of current practical 
importance is approximately 10,000 to 100,000 cm'M 1 for e and 0.1 to 1.0 for QY. 
When absorption is driven toward saturation by high-intensity illumination, the 
25 irreversible destruction of the excited fluorophore (photobleaching) becomes the factor 
limiting fluorescence detectability. The practical impact of photobleaching depends on 
the fluorescent detection technique in question. 

It will be evident to one in the art that a device (an interface) may be 
interposed between the separation and detection steps to permit the continuous 
30 operation of size separation and tag detection (in real time). This unites the separation 
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The efficiency of ESI can be very high, providing the basis for extremely 
sensitive measurements, which is useful for the invention described herein. Current 
instrumental performance can provide a total ion current at the detector of about 2x10 
12 A or about 10 7 counts/s for singly charged species. On the basis of the instrumental 
5 performance, concentrations of as low as 10 '° M or about 10* 18 mol/s of a singly 
charged species will give detectable ion current (about 10 counts/s) if the analyte is 
completely ionized. For example, low attomole detection limits have been obtained for 
quaternary ammonium ions using an ESI interface with capillary zone electrophoresis 
(Smith et al., Anal. Chem. 59: 1230, 1988). For a compound of molecular weight of 

10 1 000, the average number of charges is 1 , the approximate number of charge states is 1, 
peak width (m/z) is 1 and the maximum intensity (ion/s) is 1 x 1 0 12 . 

Remarkably little sample is actually consumed in obtaining an ESI mass 
spectrum (Smith et al., Anal Chem. 6-0:1948, 1988). Substantial gains might be also 
obtained by the use of array detectors with sector instruments, allowing simultaneous 

1 5 detection of portions of the spectrum. Since currently only about 1 0 5 of all ions formed 
by ESI are detected, attention to the factors limiting instrument performance may 
provide a basis for improved sensitivity. It will be evident to those in the art that the 
present invention contemplates and accommodates for improvements in ionization and 
detection methodologies. 

20 An interface is preferably placed between the separation instrumentation 

' {e.g., gel)and the detector {e.g., mass spectrometer). The interface preferably has the 
following properties: (l) the ability to collect the DNA fragments at discreet time 
intervals, (2) concentrate the DNA fragments, (3) remove the DNA fragments from the 
electrophoresis buffers and milieu, (4) cleave the tag from the DNA fragment, 

25 (5) separate the tag from the DNA fragment, (6) dispose of the DNA fragment, (7) place 
the tag in a volatile solution, (8) volatilize and ionize the tag, and (9) place or transport 
the tag to an electrospray device that introduces the tag into mass spectrometer. 

The interface also has the capability of "collecting" DNA fragments as 
they elute from the bottom of a gel. The gel may be composed of a slab gel, a tubular 

30 gel, a capillary, etc. The DNA fragments can be collected by several methods. The first 
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distance from the capillary the droplet diameter is often quite uniform and on the order 
of 1 Of particular importance is that the total electrospray ion current increases 

only slightly for higher liquid flow rates. There is evidence that heating is useful for 
manipulating the electrospray. For example, slight heating allows aqueous solutions to 
5 be readily electrosprayed, presumably due to the decreased viscosity and surface 
tension. Both thermally-assisted and gas-nebulization-assisted electrosprays allow 
higher liquid flow rates to be used, but decrease the extent of droplet charging. The 
formation of molecular ions requires conditions effecting evaporation of the initial 
droplet population. This can be accomplished at higher pressures by a flow of dry gas 

10 at moderate temperatures (<60°C), by heating during transport through the interface, 
and (particularly in the case of ion trapping methods) by energetic collisions at 
relatively low pressure. 

Although the detailed processes underlying ESI remain uncertain, the 
very small droplets produced by ESI appear to allow almost any species carrying a net 

15 charge in solution to be transferred to the gas phase after evaporation of residual 
solvent. Mass spectrometric detection then requires that ions have a tractable m/z range 
(<4000 daltons for quadrupole instruments) after desolvation, as well as to be produced 
and transmitted with sufficient efficiency. The wide range of solutes already found to 
be amenable to ESI-MS, and the lack of substantial dependence of ionization efficiency 

20 upon molecular weight, suggest a highly non-discriminating and broadly applicable 
ionization process. 

The electrospray ion "source" functions at near atmospheric pressure. 
The electrospray "source" is typically a metal or glass capillary incorporating a method 
for electrically biasing the liquid solution relative to a counter electrode. Solutions, 

25 typically water-methanol mixtures containing the analyte and often other additives such 
as acetic acid, flow to the capillary terminus. An ESI source has been described (Smith 
et aL, Anal. Chem. 52:885, 1990) which can accommodate essentially any solvent 
system. Typical flow rates for ESI are 1-10 uL/min. The principal requirement of an 
ESI-MS interface is to sample and transport ions from the high pressure region into the 

30 MS as efficiently as possible. 
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Improving ionization experimentally has usually involved . using acidic conditions 
Another method to improve ionization has been to use quaternary amines when possible 
(see Aebersold et al., Protein Science 7:494-503, 1992; Smith et al.. Anal Chem 
(50:436-41, 1988). 

5 Electrospray ionization is described in ra0 re detail as follows 

Electrospray ion production requires two steps: dispersal of highly charged droplets at 
near atmospheric pressure, followed by conditions to induce evaporation. A solution of 
analyte molecules is passed through a needle that is kept at high electric potential At 
the end of the needle, the solution disperses into a mist of small highly charged droplets 
10 containing the analyte molecules. The small droplets evaporate quickly and by a 
process of field desorption or residual evaporation, protonated protein molecules are 
released into the gas phase. An electrospray is generally produced by application of a 
high electric field to a small flow of liquid (generally 1-10 uL/min) from a capillary 
tube. A potential difference of 3-6 kV is typically applied between the capillary and 
15 counter electrode located 0.2-2 cm away (where ions, charged clusters, and even 
charged droplets, depending on the extent of desolvation, may be sampled by the MS 
through a small orifice). The electric field results in charge accumulation on the liquid 
surface at the capillary terminus; thus the liquid flow rate, resistivity, and surface 
tension are important factors in droplet production. The high electric field results in 
20 disruption of the liquid surface and formation of highly charged liquid droplets 
' Posmvely or negatively charged droplets can be produced depending upon the capillary 
bias. The negative ion mode requires the presence of an electron scavenger such as 
oxygen to inhibit electrical discharge. 

A wide range of liquids can be sprayed electrostatically into a vacuum 
25 or with the aid of a nebulizing agent. The use of only electric fields for nebulization 
leads to some practical restrictions on the range of liquid conductivity and dielectric 
constant. Solution conductivity of less than 10* ohms is required at room temperature 
for a stable electrospray at useful liquid flow rates corresponding to an aqueous 
electrolyte solution of < 10- M. In the mode found most useful for ESI-MS an 
30 appropriate liquid flow rate results in dispersion of the liquid as a fine mist. A short 
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The ionization methods amenable to nonvolatile biological compounds 
have overlapping ranges of applicability. Ionization efficiencies are highly dependent 
on matrix composition and compound type. Currently available results indicate that the 
upper molecular mass for TS is about 8000 daltons (Jones and Krolik, Rapid Comm. 
5 Mass Spectrom. 7:67, 1987). Since TS is practiced mainly with quadrapole mass 
spectrometers, sensitivity typically suffers disporportionately at higher mass-to-charge 
ratios (m/z). Time-of-flight (TOF) mass spectrometers are commercially available and 
possess the advantage that the m/z range is limited only by detector efficiency. 
Recently, two additional ionization methods have been introduced. These two methods 

10 are now referred to as matrix-assisted laser desorption (MALDI, Karas and Hillenkamp, 
Anal Chem. (50:2299, 1988; Karas et aL, Angew. Chem. 707:805, 1989) and 
electrospray ionization (ESI). Both methodologies have very high ionization efficiency 
(i.e., very high [molecular ions produced]/[molecules consumed]). Sensitivity, which 
defines the ultimate potential of the technique, is dependent on sample size, quantity of 

1 5 ions, flow rate, detection efficiency and actual ionization efficiency. 

Electrospray-MS is based on an idea first proposed in the 1960s (Dole et 
al., J. Chem. Phys. 49:2240, 1968). Electrospray ionization (ESI) is one means to 
produce charged molecules for analysis by mass spectroscopy. Briefly, electrospray 
ionization produces highly charged droplets by nebulizing liquids in a strong 

20 electrostatic field. The highly charged droplets, generally formed in a dry bath gas at 
atmospheric pressure, shrink by evaporation of neutral solvent until the charge 
repulsion overcomes the cohesive forces, leading to a "Coulombic explosion". The 
exact mechanism of ionization is controversial and several groups have put forth 
hypotheses (Blades et aL, Anal. Chem. 63:2109-14, 1991; Kebarie et ah. Anal Chem. 

25 6J:A972-86, 1993; Fenn, J. Am. Soc. Mass. Spectrom. 4:524-35, 1993). Regardless of 
the ultimate process of ion formation, ESI produces charged molecules from solution 
under mild conditions. 

The ability to obtain useful mass spectral data on small amounts of an 
organic molecule relies on the efficient production of ions. The efficiency of ionization 

30 for ESI is related to the extent of positive charge associated with the molecule. 
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Secondary ionization (SIMS) utilizes an ion beam; such as 3 He\ ,6 0\ or 40 Ar + ; is 
focused onto the surface of a sample and sputters material into the gas phase. Spark 
source is a method which ionizes analytes in solid samples by pulsing an electric current 
across two electrodes. 

5 A tag may become charged prior to, during or after cleavage from the 

molecule to which it is attached. Ionization methods based on ion "desorption", the 
direct formation or emission of ions from solid or liquid surfaces have allowed 
increasing application to nonvolatile and thermally labile compounds. These methods 
eliminate the need for neutral molecule volatilization prior to ionization and generally 

10 minimize thermal degradation of the molecular species. These methods include field 
desorption (Becky, Principles of Field Ionization and Field Desorption Mass 
Spectrometry, Pergamon, Oxford, 1977), plasma desorption (Sundqvist and Macfarlane, 
Mass Spectrom. Rev. 4:421, 1985), laser desorption (Karas and Hillenkamp, Anal. 
Chem. 60:2299, 1988; Karas et al., Angew. Chem. 707:805, 1989), fast particle 

15 bombardment (e.g., fast atom bombardment, FAB, and secondary ion mass 
spectrometry, SIMS, Barber et al., Anal Chem. 54:645 A, 1982), and thermospray (TS) 
ionization (Vestal, Mass Spectrom. Rev. 2:447, 1983). Thermospray is broadly applied 
for the on-line combination with liquid chromatography. The continuous flow FAB 
methods (Caprioli et al., Anal. Chem. 55:2949, 1986) have also shown significant 

20 potential. A more complete listing of ionization/mass spectrometry combinations is 
' ion-trap mass spectrometry, electrospray ionization mass spectrometry, ion-spray mass 
spectrometry, liquid ionization mass spectrometry, atmospheric pressure ionization 
mass spectrometry, electron ionization mass spectrometry, metastable atom 
bombardment ionization mass spectrometry, fast atom bombard ionization mass 

25 spectrometry, MALDI mass spectrometry, , photo-ionization time-of-flight mass 
spectrometry, laser droplet mass spectrometry, MALDI-TOF mass spectrometry, APCI 
mass spectrometry, nano-spray mass spectrometry, nebulised spray ionization mass 
spectrometry, chemical ionization mass spectrometry, resonance ionization mass 
spectrometry, secondary ionization mass spectrometry, thermospray mass spectrometry. 
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plasma and glow discharge. Plasma is a hot, partial ly-ionized gas that effectively 
excites and ionizes atoms. A glow discharge is a low-pressure plasma maintained 
between two electrodes. Electron impact ionization employs an electron beam, usually 
generated from a tungsten filament, to ionize gas-phase atoms or molecules. An 
5 electron from the beam knocks an electron off analyte atoms or molecules to create 
ions. Electrospray ionization utilizes a very fine needle and a series of skimmers. A 
sample solution is sprayed into the source chamber to form droplets. The droplets carry 
charge when the exit the capillary and as the solvent vaporizes the droplets disappear 
leaving highly charged analyte molecules. ESI is particularly useful for large biological 
10 molecules that are difficult to vaporize or ionize. Fast-atom bombardment (FAB) 
utilizes a high-energy beam of neutral atoms, typically Xe or Ar, that strikes a solid 
sample causing desorption and ionization. It is used for large biological molecules that 
are difficult to get into the gas phase. FAB causes little fragmentation and usually gives 
a large molecular ion peak, making it useful for molecular weight determination. The 
15 atomic beam is produced by accelerating ions from an ion source though a charge-; 
exchange cell. The ions pick up an electron in collisions with neutral atoms to form a 
beam of high energy atoms. Laser ionization (LIMS) is a method in which a laser pulse , 
ablates material from the surface of a sample and creates a microplasma that ionizes 
some of the sample constituents. Matrix-assisted laser desorption ionization (MALDI) 
20 is a LIMS method of vaporizing and ionizing large biological molecules such as 
proteins or DNA fragments. The biological molecules are dispersed in a solid matrix 
such as nicotinic acid. A UV laser pulse ablates the matrix which carries some of the 
large molecules into the gas phase in an ionized form so they can be extracted into a 
mass spectrometer. Plasma-desorption ionization (PD) utilizes the decay of 252 Cf which 
25 produces two fission fragments that travel in opposite directions. One fragment strikes 
the sample knocking out 1-10 analyte ions. The other fragment strikes a detector and 
triggers the start of data acquisition. This ionization method is especially useful for 
large biological molecules. Resonance ionization (RIMS) is a method in which one or 
more laser beams are tuned in resonance to transitions of a gas-phase atom or molecule 
30 to promote it in a stepwise fashion above its ionization potential to create an ion. 
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In general, a mass spectrometer (MS) consists of an ion source, a mass- 
selective analyzer, and an ion detector. The magnetic-sector, quadrupole, and time-of- 
flight designs also require extraction and acceleration ion optics to transfer ions from 
the source region into the mass analyzer. The details of several mass analyzer designs 

5 (for magnetic-sector MS, quadrupole MS or time-of-flight MS) are discussed below. 
Single Focusing analyzers for magnetic-sector MS utilize a particle beam path of 180, 
90, or 60 degrees. The various forces influencing the particle separate ions with 
different mass-to-charge ratios. With double-focusing analyzers, an electrostatic 
analyzer is added in this type of instrument to separate particles with difference in 

1 0 kinetic energies. 

A quadrupole mass filter for quadrupole MS consists of four metal rods 
arranged in parallel. The applied voltages affect the trajectory of ions traveling down 
the flight path centered between the four rods. For given DC and AC voltages, only 
ions of a certain mass-to-charge ratio pass through the quadrupole filter and all other 

15 ions are thrown out of their original path. A mass spectrum is obtained by monitoring 
the ions passing through the quadrupole filter as the voltages on the rods are varied. 

A time-of-flight mass spectrometer uses the differences in transit time 
through a "drift region" to separate ions of different masses. It operates in a pulsed 
mode so ions must be produced in pulses and/or extracted in pulses. A pulsed electric 

20 field accelerates all ions into a field-free drift region with a kinetic energy of qV, where 
' q is the ion charge and V is the applied voltage. Since the ion kinetic energy is 
0.5 rnV\ lighter ions have a higher velocity than heavier ions and reach the detector at 
the end of the drift region sooner. The output of an ion detector is displayed on an 
oscilloscope as a function of time to produce the mass spectrum. 

25 The ion formation process is the starting point for mass spectrometric 

analyses. Chemical ionization is a method that employs a reagent ion to react with the 
analyte molecules (tags) to form ions by either a proton or hydride transfer. The reagent 
ions are produced by introducing a large excess of methane (relative to the tag) into an 
electron impact (EI) ion source. Electron collisions produce CH/ and CH 3 " which 

30 further react with methane to form CH 5 + and C 2 H 5 \ Another method to ionize tags is by 
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usually a tungsten lamp and the detector is usually a PbS solid-state detector. Sample 
holders can be glass or quartz and typical solvents are CC1 4 and CS 2 . The convenient 
instrumentation of NIR spectroscopy makes it suitable for on-line monitoring and 
process control. 

5 Ultraviolet and Visible Absorption Spectroscopy (uv-vis) spectroscopy is 

the measurement of the wavelength and intensity of absorption of near-ultraviolet and 
visible light by a sample. Absorption in the vacuum UV occurs at 100-200 nm; (10 5 - 
50,000 cm* 1 ) quartz UV at 200-350 nm; (50,000-28,570 cm 1 ) and visible at 350-800 
nm; (28,570-12,500 cm" 1 ) and is described by the Beer-Lambert-Bouguet law. 
1 0 Ultraviolet and visible light are energetic enough to promote outer electrons to higher 
energy levels. UV-vis spectroscopy can be usually applied to molecules and inorganic 
ions or complexes in solution. The uv-vis spectra are limited by the broad features of 
the spectra. The light source is usually a hydrogen or deuterium lamp for uv> 
measurements and a tungsten lamp for visible measurements. The wavelengths of these 
15 continuous light sources are selected with a wavelength separator such as a prism or 
grating monochromator. Spectra are obtained by scanning the wavelength separator and 
quantitative measurements can be made from a spectrum or at a single wavelength. 

Mass spectrometers use the difference in the mass-to-charge ratio (m/z) 
of ionized atoms or molecules to separate them from each other. Mass spectrometry is 
20 therefore useful for quantitation of atoms or molecules and also for determining 
chemical and structural information about molecules. Molecules have distinctive 
fragmentation patterns that provide structural information to identify compounds. The 
general operations of a mass spectrometer are as follows. Gas-phase ions are created, 
the ions are separated in space or time based on their mass-to-charge ratio, and the 
25 quantity of ions of each mass-to-charge ratio is measured. The ion separation power of 
a mass spectrometer is described by the resolution, which is defined as R = m / delta m, 
where m is the ion mass and delta m is the difference in mass between two resolvable 
peaks in a mass spectrum. For example, a mass spectrometer with a resolution of 1000 
can resolve an ion with a m/z of 1 00.0 from an ion with a m/z of 1 00. 1 . 
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Atoms or molecules that are excited to high energy levels can decay to 
lower levels by emitting radiation. This light emission is called fluorescence if the 
transition is between states of the same spin, and phosphorescence if the transition 
occurs between states of different spin. The emission intensity of an analyte is linearly 
5 proportional to concentration (at low concentrations), and is useful for quantifying the 
emitting species. Specific emission spectroscopic methods include atomic emission 
spectroscopy (AES), atomic fluorescence spectroscopy (AFS), molecular laser-induced 
fluorescence (LIF), and X-ray fluorescence (XRF). 

When electromagnetic radiation passes through matter, most of the 
10 radiation continues in its original direction but a small fraction is scattered in other 
directions. Light that is scattered at the same wavelength as the incoming light is called 
Rayleigh scattering. Light that is scattered in transparent solids due to vibrations 
(phonons) is called Brillouin scattering. Brillouin scattering is typically shifted by 0.1 
to 1 wave number from the incident light. Light that is scattered due to vibrations in 
15 molecules or optical phonons in opaque solids is called Raman scattering. Raman 
scattered light is shifted by as much as 4000 wavenumbers from the incident light. 
Specific scattering spectroscopic methods include Raman spectroscopy. 

IR spectroscopy is the measurement of the wavelength and intensity of 
the absorption of mid-infrared light by a sample. Mid-infrared light (2.5 - 50 um, 4000 
- 200 cm 1 ) is energetic enough to excite molecular Vibrations to higher energy levels. 
' The wavelength of IR absorption bands are characteristic of specific types of chemical 
bonds and IR spectroscopy is generally most useful for identification of organic and 
organometallic molecules. 

Near-infrared absorption spectroscopy (NIR) is the measurement of the 
25 wavelength and intensity of the absorption of near-infrared light by a sample. Near- 
infrared light spans the 800 nm - 2.5 um (12,500 - 4000 cm ') range and is energetic 
enough to excite overtones and combinations of molecular vibrations to higher energy 
levels. NIR spectroscopy is typically used for quantitative measurement of organic 
functional groups, especially O-H, N-H, and 00. The components and design of NIR 
30 instrumentation are similar to uv-vis absorption spectrometers. The light source is 
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Description 

METHODS AND COMPOSITIONS FOR DETERMINING 
THE SEQUENCE OF NUCLEIC ACID MOLECULES 

5 

TECHNICAL FIELD 

The present invention relates generally to methods and compositions for 
determining the sequence of nucleic acid molecules, and more specifically, to methods 
and compositions which allow the determination of multiple nucleic acid sequences 
10 simultaneously. 



BACKGROUND OF THE INVENTION 

Deoxyribonucleic acid (DNA) sequencing is one of the basic techniques 
of biology. It is at the heart of molecular biology and plays a rapidly expanding role in 
1 5 the rest of biology. The Human Genome Project is a multi-national effort to read the 
entire human genetic code. It is the largest project ever undertaken in biology, and has 
already begun to have a major impact on medicine. The development of cheaper and 
, faster sequencing technology will ensure the success of this project Indeed, a 
substantial effort has been funded by the NIH and DOE branches of the Human 
20 Genome Project to improve sequencing technology, however, without a substantial 
impact on current practices (Sulston and Waterston, Nature 376: 1 75, 1 995). 

In the past two decades, determination and analysis of nucleic acid 
sequence has formed one of the building blocks of biological research. This, along with 
new investigational tools and methodologies, has allowed scientists to study genes and 
25 gene products in order to better understand the function of these genes, as well as to 
develop new therapeutics and diagnostics. 

Two different DNA sequencing methodologies that were developed in 
1977, are still in wide use today. Briefly, the enzymatic method described by Sanger 
(Proc. Natl. Acad. Sci. (USA) 7*5463, 1977) which utilizes dideoxy-terminators, 
involves the synthesis of a DNA strand from a single-stranded template by a DNA 
polymerase. The Sanger method of sequencing depends on the fact that that 
dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way 



30 
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as normal deoxynucleotides (albeit at a lower efficiency). However, ddNTPs differ 
from normal deoxynucleotides (dNTPs) in that they lack the 3'-OH group necessary for 
chain elongation. When a ddNTP is incorporated into the DNA chain, the absence of 
the 3'-hydroxy group prevents the formation of a new phosphodiester bond and the 
5 DNA fragment is terminated with the ddNTP complementary to the base in the template 
DNA. The Maxam and Gilbert method (Maxam and Gilbert, Proc. Natl Acad. Sci. 
(USA) 74:560, 1977) employs a chemical degradation method of the original DNA (in 
both cases the DNA must be clonal). Both methods produce populations of fragments 
that begin from a particular point and terminate in every base that is found in the DNA 

10 fragment that is to be sequenced. The termination of each fragment is dependent on the 
location of a particular base within the original DNA fragment. The DNA fragments 
are separated by polyacrylamide gel electrophoresis and the order of the DNA bases 
(adenine, cytosine, thymine, guanine; also known as A,C,T,G, respectively) is read from 
a autoradiograph of the gel. 

15 A cumbersome DNA pooling sequencing strategy (Church and Kieffer- 

Higgins, Science 24:185, 1988) is one of the more recent approaches to DNA 
sequencing. A pooling sequencing strategy consists of pooling a number of DNA 
templates (samples) and processing the samples as pools. In order to separate the 
sequence information at the end of the processing, the DNA molecules of interest are 

20 ligated to a set of oligonucleotide 4t tags" at the beginning. The tagged DNA molecules 
are pooled, amplified and chemically fragmented in 96-well plates. After 
electrophoresis of the pooled samples, the DNA is transferred to a solid support and 
then hybridized with a sequential series of specific labeled oligonucleotides. These 
membranes are then probed as many times as there are tags in the original pool, 

25 producing, in each set of probing, autoradiographs similar to those from standard DNA 
sequencing methods. Thus each reaction and gel yields a quantity of data equivalent to 
that obtained from conventional reactions and gels multiplied by the number of probes 
used. If alkaline phosphatase is used as the reporter enzyme, 1 ,2-dioxetane substrate 
can be used which is detected in a chemiluminescent assay format. However, this 

30 pooling strategy's major disadvantage is that the sequences can only be read by 
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Southern blotting the sequencing gel and hybridizing this membrane once for each 
clone in the pool. 

In addition to advances in sequencing methodologies, advances in speed 
have occurred due to the advent of automated DNA sequencing. Briefly, these methods 
use fluorescent-labeled primers which replace methods which employed radiolabeled 
components. Fluorescent dyes are attached either to the sequencing primers or the 
ddNTP-terminators. Robotic components now utilize polymerase chain reaction (PCR) 
technology which has lead to the development of linear amplification strategies. 
Current commercial sequencing allows all 4 dideoxy-terminator reactions to be run on a 
single lane. Each dideoxy-terminator reaction is represented by a unique fluorescent 
primer (one fluorophore for each base type: A,T,C,G). Only one template DNA (i.e., 
DNA sample) is represented per lane. Current gels permit the simultaneous 
electrophoresis of up to 64 samples in 64 different lanes. Different ddNTP-terminated 
fragments are detected by the irradiation of the gel lane by light followed by detection 
of emitted light from the fluorophore. Each electrophoresis step is about 4-6 hours 
long. Each electrophoresis separation resolves about 400-600 nucleotides (nt), 
therefore, about 6000 nt can be sequenced per hour per sequencer. 

The use of mass spectrometry for the study of monomeric constituents of 
nucleic acids has also been described (Hignite, In Biochemical Applications of Mass 
Spectrometry, Waller and Dermer (eds.), Wiley-Interscience, Chapter 16, p. 527, 1972). 
Briefly, for larger oligomers, significant early success was obtained by plasma 
desorption for protected synthetic oligonucleotides up to 14 bases long, and for 
unprotected oligos up to 4 bases in length. As with proteins, the applicability of ESI- 
MS to oligonucleotides has been demonstrated (Covey et al., Rapid Comm. in Mass 
Spec. 2:249-256, 1988). These species are ionized in solution, with the charge residing 
at the acidic bridging phosphodiester and/ or terminal phosphate moieties, and yield in 
the gas phase multiple charged molecular anions, in addition to sodium adducts. 

Sequencing DNA with <100 bases by the common enzymatic ddNTP 
technique is more complicated than it is for larger DNA templates, so that chemical 
degradation is sometimes employed. However, the chemical decomposition method 
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requires about 50 pmol of radioactive 32 P end-labeled material, 6 chemical steps, 
electrophoretic separation, and film exposure. For small oligonucleotides (<14 nts) the 
combination of electrospray ionization (ESI) and Fourier transform (FT) mass 
spectrometry (MS) is far faster and more sensitive. Dissociation products of multiply- 
5 charged ions measured at high (10 5 ) resolving power represent consecutive backbone 
cleavages providing the full sequence in less than one minute on sub-picomole quantity 
of sample (Little etal., J. Am. Chem. Soc. 77(5:4893, 1994). For molecular weight 
measurements, ESI/MS has been extended to larger fragments (Potier et aL, Nuc. Acids 
Res. 22:3895, 1994). ESI/FTMS appears to be a valuable complement to classical 

10 methods for sequencing and pinpoint mutations in nucleotides as large as 100-mers. 
Spectral data have recently been obtained loading 3 x 10~ 13 mol of a 50-mer using a 
more sensitive ESI source (Valaskovic, Anal. Chem. 68:259 t 1995). 

The other approach to DNA sequencing by mass spectrometry is one in 
which DNA is labeled with individual isotopes of an element and the mass spectral 

1 5 analysis simply has to distinguish the isotopes after a mixtures of sizes of DNA have 
been separated by electrophoresis. (The other approach described above utilizes the 
resolving power of the mass spectrometer to both separate and detect the DNA 
oligonucleotides of different lengths, a difficult proposition at best.) All of the 
procedures described below employ the Sanger procedure to convert a sequencing 

20 primer to a series of DNA fragments that vary in length by one nucleotide. The 
enzymatically synthesized DNA molecules each contain the original primer, a replicated 
sequence of part of the DNA of interest, and the dideoxy terminator. That is, a set of 
DNA molecules is produced that contain the primer and differ in length by from each 
other by one nucleotide residue. 

25 Brennen et al. (Biol. Mass Spec, New York, Elsevier, p. 219, 1990) has 

described methods to use the four stable isotopes of sulfur as DNA labels that enable 
one to detect DNA fragments that have been separated by capillary electrophoresis. 
Using the a-thio analogues of the ddNTPs, a single sulfur isotope is incorporated into 
each of the DNA fragments. Therefore each of the four types of DNA fragments 

30 (ddTTP, ddATP, ddGTP, ddCTP-terminated) can be uniquely labeled according to the 
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terminal nucleotide; for example, "S for fragments ending in A, "S for G, "S for C, and 
36 S for T, and mixed together for electrophoresis column, fractions of a few picoliters 
are obtained by a modified ink-jet printer head, and then subjected to complete 
combustion in a furnace. This process oxidizes the thiophosphates of the labeled DNA 
5 to SO,, which is subjected to analysis in a quadruple or magnetic sector mass 
spectrometer. The SO, mass unit representation is 64 for fragments ending in A, 65 for 
G, 66 for C, and 68 for T. Maintenance of the resolution of the DNA fragments as they 
emerge from the column depends on taking sufficiently small fractions. Because the 
mass spectrometer is coupled directly to the capillary gel column, the rate of analysis is 
10 determined by the rate of electrophoresis. This process is unfortunately expensive, 
liberates radioactive gas and has not been commercialized. Two other basic constraints 
also operate on this approach: (a) No other components with mass of 64, 65, 66, or 68 
(isobaric contaminants) can be tolerated and (b) the % natural abundances of the sulfur 
isotopes ("S is 95.0, »S is 0.75, »S is 4.2, and »S is 0.1 1) govern the sensitivity and 
1 5 cost. Since »S is 95% naturally abundant, the other isotopes must be enriched to >99% 
to eliminate contaminating »S. Isotopes that are <1% abundant are quite expensive to 
obtain at 99% enrichment; even when »S is purified 100-fold it contains as much or 
more *S as it does 36 S. 

Gilbert has described an automated DNA sequencer (EPA, 92 1 08678.2) 
20 that consists of an oligomer synthesizer, an array on a membrane, a detector which 
detects hybridization and a central computer, The synthesizer synthesizes and labels 
multiple oligomers of arbitrary predicted sequence. The oligomers are used to probe 
immobilized DNA on membranes. The detector identifies hybridization patterns and 
then sends those patterns to a central computer which constructs a sequence and then 
25 predicts the sequence of the next round of synthesis of oligomers. Through an iterative 
process, a DNA sequence can be obtained in an automated fashion. 

Brennen has described a method for sequencing nucleic acids based on 
ligation of oligomers (U.S. Patent No. 5,403,708), Methods and compositions are 
described for forming ligation product hybridized to a nucleic acid template. A primer 
30 is hybridized to a DNA template and then a pool of random extension oligonucleotides 
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is also hybridized to the primed template in the presence ligase(s). The ligase enzyme 
covalently ligates the hybridized oligomers to the primer. Modifications permit the 
determination of the nucleotide sequence of one or more members of a first set of target 
nucleotide residues in the nucleic acid template that are spaced at intervals of N 
5 nucleotides. In this method, the labeled ligated product is formed wherein the position 
and type of label incorporated into the ligation product provides information concerning 
the nucleotide residue in the nucleic acid template with which the labeled nucleotide 
residue is base paired. 

Koster has described an method for sequencing DNA by mass 
10 spectrometry after degradation of DNA by an exonuclease (PCT/US94/02938). The 
method described is simple in that DNA sequence is directly determined (the Sanger 
reaction is not used). DNA is cloned into standard vectors, the 5' end is immobilized 
and the strands are then sequentially degraded at the 3 1 end via an exonuclease and the 
enzymatic product (nucleotides) are detected by mass spectrometry. 
15 Weiss et aL have described an automated hybridization/imaging device 

for fluorescent multiplex DNA sequencing (PCT/US94/1 1918). The method is based 
on the concept of hybridizing enzyme-linked probes to a membrane containing size 
separated DNA fragments arising from a typical Sanger reaction. 

The demand for sequencing information is larger than can be supplied by 
20 the currently existing sequencing machines, such as the ABI377 and the Pharmacia 
ALF. One of the principal limitations of the current technology is the small number of 
tags which can be resolved using the current tagging system. The Church pooling 
system discussed above uses more tags, but the use and detection of these tags is 
laborious. 

25 The present invention discloses novel compositions and methods which 

may be utilized to sequence nucleic acid molecules with greatly increased speed and 
sensitivity than the methods described above, and further provides other related 
advantages. 
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SUMMARY OF THE INVENTION 

Briefly stated, the present invention provides methods, compounds, 
compositions, kits and systems for determining the sequence of nucleic acid molecules. 
Within one aspect of the invention, methods are provided for determining the sequence 
5 of a nucleic acid molecule. The methods includes the steps: (a) generating tagged 
nucleic acid fragments which are complementary to a selected target nucleic acid 
molecule, wherein a tag is correlative with a particular nucleotide and detectable by 
non-fluorescent spectrometry or potentiometry; (b) separating the tagged fragments by 
sequential length; (c) cleaving the tags from the tagged fragments; and (d) detecting the 
1 0 tags by non-fluorescent spectrometry or potentiometry, and therefrom determining the 
sequence of the nucleic acid molecule. In preferred embodiments, the tags are detected 
by mass spectrometry, infrared spectrometry, ultraviolet spectrometry or potentiostatic 
amperometry. 

In another aspect, the invention provides a compound of the formula: 
15 T^-L-X 

wherein T™ is an organic group detectable by mass spectrometry, comprising carbon, at 
least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, 
sulfur, phosphorus and iodine; L is an organic group which allows a ^-containing 
moiety to be cleaved from the remainder of the compound, wherein the ^-containing 
20 moiety comprises a functional group which supports a single ionized charge state when 
the compound is subjected to mass spectrometry and is selected from tertiary amine, 
quaternary amine and organic acid; X is a functional group selected from hvdroxyl, 
amino, thiol, carboxylic acid, haloalkyl, and derivatives thereof which either activate or 
inhibit the activity of the group toward coupling with other moieties, or is a nucleic acid 
25 fragment attached to L at other than the 3' end of the nucleic acid fragment; with the 
provisos that the compound is not bonded to a solid support through X nor has a mass 
of less than 250 daltons. 

In another aspect, the invention provides a composition comprising a 
plurality of compounds of the formula T»»-L-MOI, wherein, T»* is an organic group 
30 detectable by mass spectrometry, comprising carbon, at least one of hydrogen and 
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fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and 
iodine; L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T^-containing moiety comprises 
a functional group which supports a single ionized charge state when the compound 
5 is subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; MOI is a nucleic acid fragment wherein L is conjugated to 
the MOI at a location other than the 3' end of the MOI; and wherein no two 
compounds have either the same 1™ or the same MOI. 

In another aspect, the invention provides a composition comprising 

1 0 water and a compound of the formula T^-L-MOI, wherein, T" 15 is an organic group 
detectable by mass spectrometry, comprising carbon, at least one of hydrogen and 
fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and 
iodine; L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the ^-containing moiety comprises 

1 5 a functional group which supports a single ionized charge state when the compound 
is subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; and MOI is a nucleic acid fragment wherein L is conjugated 
to the MOI at a location other than the 3* end of the MOI. 

In another aspect, the invention provides for a composition 

20 comprising a plurality of sets of compounds, each set of compounds having the 
formula T^-L-MOI, wherein, T m& is an organic group detectable by mass 
spectrometry, comprising carbon, at least one of hydrogen and fluoride, and optional 
atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an 
organic group which allows a ^-containing moiety to be cleaved from the 

25 remainder of the compound, wherein the ^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; MOI is a nucleic acid fragment wherein L is conjugated to 
the MOI at a location other than the 3' end of the MOI; wherein within a set, all 

30 members have the same 1™ group, and the MOI fragments have variable lengths 
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that terminate with the same dideoxynucleotide selected from ddAMP, ddGMP, 
ddCMP and ddTMP; and wherein between sets, the T" groups differ by at least 2 



amu. 



10 



In another aspect, the invention provides for a composition 
comprising a first plurality of sets of compounds as described in the preceding 
paragraph, in combination with a second plurality of sets of compounds having the 
formula T~-L-MOI, wherein, T™ is an organic group detectable by mass 
spectrometry, comprising carbon, at least one of hydrogen and fluoride, and optional 
atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an 
organic group which allows a ^-containing moiety to be cleaved from the 
remainder of the compound, wherein the r"-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary 
amine and organic acid; MOI is a nucleic acid fragment wherein L is conjugated to 
15 the MOI at a location other than the 3' end of the MOI; and wherein all members 
within the second plurality have an MOI sequence which terminates with the same 
dideoxynucleotide selected from ddAMP, ddGMP, ddCMP and ddTMP; with the 
proviso that the dideoxynucleotide present in the compounds of the first plurality is 
not the same dideoxynucleotide present in the compounds of the second plurality. 
20 In another aspect, the invention provides for a kit for DNA 

sequencing analysis. The kit comprises a plurality of container sets, each container 
set comprising at least five containers, wherein a first container contains a vector, a 
second, third, fourth and fifth containers contain compounds of the formula 
T-»-L-MOI wherein, T» is an organic group detectable by mass spectrometry, 
25 comprising carbon, at least one of hydrogen and fluoride, and optional atoms 
selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an organic 
group which allows a T»-containing moiety to be cleaved from the remainder of the 
compound, wherein the T»-containing moiety comprises a functional group which 
supports a single ionized charge state when the compound is subjected to mass 
30 spectrometry and is selected from tertiary amine, quaternary amine and organic 
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acid; and MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3* end of the MOI; such that the MOI for the second, third, 
fourth and fifth containers is identical and complementary to a portion of the vector 
within the set of containers, and the 1™ group within each container is different 
5 from the other T" s groups in the kit. 

In another aspect, the invention provides for a system for determining the 
sequence of a nucleic acid molecule. The system comprises a separation apparatus that 
separates tagged nucleic acid fragments, an apparatus that cleaves from a tagged nucleic 
acid fragment a tag which is correlative with a particular nucleotide and detectable by 
1 0 electrochemical detection, and an apparatus for potentiostatic amperometry . 

Within other embodiments of the invention, 4, 5, 10, 15, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 200, 250, 300, 350, 400, 450 or greater than 500 different 
and unique tagged molecules may be utilized within a given reaction simultaneously, 
wherein each tag is unique for a selected nucleic acid fragment, probe, or first or second 
1 5 member, and may be separately identified. 

These and other aspects of the present invention will become evident 
upon reference to the following detailed description and attached drawings. In addition, 
various references are set forth below which describe in more detail certain procedures 
or compositions (e.g., plasmids, etc.), and are therefore incorporated by reference in 
20 their entirety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts the flowchart for the synthesis of pentafluorophenyl 
esters of chemically cleavable mass spectroscopy tags, to liberate tags with carboxyl 
25 amide termini. 

Figure 2 depicts the flowchart for the synthesis of pentafluorophenyl 
esters of chemically cleavable mass spectroscopy tags, to liberate tags with carboxyl 
acid termini. 

Figures 3-6 and 8 depict the flowchart for the synthesis of 
30 tetrafluorophenyl esters of a set of 36 photochemically cleavable mass spec. tags. 

Figure 7 depicts the flowchart for the synthesis of a set of 36 amine- 
terminated photochemically cleavable mass spectroscopy tags. 
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Figure 9 depicts the synthesis of 36 photochemically cleavable mass 
spectroscopy tagged oligonucleotides made from the corresponding set of 36 
tetrafluorophenyl esters of photochemically cleavable mass spectroscopy tag acids. 

Figure 10 depicts the synthesis of 36 photochemically cleavable mass 
spectroscopy tagged oligonucleotides made from the corresponding set of 36 amine- 
terminated photochemically cleavable mass spectroscopy tags. 

Figure 1 1 illustrates the simultaneous detection of multiple tags by mass 

spectrometry. 

Figure 12 shows the mass spectrogram of the alpha-cyano matrix alone. 
Figure 13 depicts a modularly-constructed tagged nucleic acid fragment. 



DETAILED DESCRIPTION OF THE INVENTION 

Briefly stated, in one aspect the present invention provides compounds 
wherein a molecule of interest, or precursor thereto, is linked via a labile bond (or labile 
1 5 bonds) to a tag. Thus, compounds of the invention may be viewed as having the general 



formula: 

T-L-X 



wherein T is the tag component, L is the linker component that either is, or contains, a 
20 labile bond, and X is either the molecule of interest (MOI) component or a functional 
group component (L h ) through which the MOI may be joined to T-L. Compounds of 
the invention may therefore be represented by the more specific general formulas: 



25 



T-L-MOI and T-L-L h 



For reasons described in detail below, sets of T-L-MOI compounds may 
be purposely subjected to conditions that cause the labile bond(s) to break, thus 
releasing a tag moiety from the remainder of the compound. The tag moiety is then 
characterized by one or more analytical techniques, to thereby provide direct 
30 information about the structure of the tag moiety, and (most importantly) indirect 
information about the identity of the corresponding MOI. 
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As a simple illustrative example of a representative compound of the 
invention wherein L is a direct bond, reference is made to the following structure (i): 




id Fragment) 



< y > s v ^ 

Tag component Molecule of Interest 

component 

5 

In structure (i), T is a nitrogen-containing polycyclic aromatic moiety bonded to a 
carbonyl group, X is a MOI (and specifically a nucleic acid fragment terminating in an 
amine group), and L is the bond which forms an amide group. The amide bond is labile 
relative to the bonds in T because, as recognized in the art, an amide bond may be 
10 chemically cleaved (broken) by acid or base conditions which leave the bonds within 
the tag component unchanged. Thus, a tag moiety (i.e., the cleavage product that 
contains T) may be released as shown below: 
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.(Nucleic Acid Fragment) 



acid or base 



OH H 2 N" 



.(Nucleic Acid Fragrant) 



Remainder of the Compound 



However, the linker L may be more than merely a direct bond, as shown 
in the following illustrative example, where reference is made to another representative 
5 compound of the invention having the structure (ii) shown below: 



Structure (n) 






NO, 







o 



L 



1 



^(Nucleic Acid 
I Fragment) 
H 



MOI 



It is well-known that compounds having an or/fco-nitrobenzylamine moiety (see boxed 
10 atoms within structure (ii)) are photolytically unstable, in that exposure of such 
compounds to actinic radiation of a specified wavelength will cause selective cleavage 
of the benzylamine bond (see bond denoted with heavy line in structure (ii)). Thus, 
structure (ii) has the same T and MOI groups as structure (i), however the linker group 
contains multiple atoms and bonds within which there is a particularly labile bond. 
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Photolysis of structure (ii) thus releases a tag moiety (T-containing moiety) from the 
remainder of the compound, as shown below. 




Tag Moiety Remainder of the Compound 

5 

The invention thus provides compounds which, upon exposure to 
appropriate cleavage conditions, undergo a cleavage reaction so as to release a tag 
moiety from the remainder of the compound. Compounds of the invention may be 
described in terms of the tag moiety, the MOl (or precursor thereto, L h ), and the labile 
10 bond(s) which join the two groups together. Alternatively, the compounds of the 
invention may be described in terms of the components from which they are formed. 
Thus, the compounds may be described as the reaction product of a tag reactant, a linker 
reactant and a MOI reactant, as follows. 

The tag reactant consists of a chemical handle (T h ) and a variable 
1 5 component (T vc ), so that the tag reactant is seen to have the general structure: 

T vc -T h 

To illustrate this nomenclature, reference may be made to structure (iii), which shows a 
20 tag reactant that may be used to prepare the compound of structure (ii). The tag reactant 
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having structure (iii) contains a tag variable component and a tag handle, as shown 
below: 



Structure (in) 




-v 



Tag Variable Tag 
Component Handle 

In structure (iii), the tag handle (-C(=0)-A) simply provides an avenue 
for reacting the tag reactant with the linker reactant to form a T-L moiety. The group 
"A" in structure (iii) indicates that the carboxyl group is in a chemically active state, so 
it is ready for coupling with other handles. "A" may be, for example, a hydroxyl group 
or pentafluorophenoxy, among many other possibilities. The invention provides for a 
large number of possible tag handles which may be bonded to a tag variable component, 
as discussed in detail below. The tag variable component is thus a part of T" in the 
formula T-L-X, and will also be part of the tag moiety that forms from the reaction that 
cleaves L. 

As also discussed in detail below, the tag variable component is so- 
named because, in preparing sets of compounds according to the invention, it is desired 
that members of a set have unique variable components, so that the individual members 
may be distinguished from one another by an analytical technique. As one example, the 
tag variable component of structure (iii) may be one member of the following set, where 
members of the set may be distinguished by their UV or mass spectra: 
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10 



15 





Likewise, the linker reactant may be described in terms of its chemical 
handles (there are necessarily at least two, each of which may be designated as L h ) 
5 which flank a linker labile component, where the linker labile component consists of the 
required labile moiety (L 2 ) and optional labile moieties (L 1 and L 3 ), where the optional 
labile moieties effectively serve to separate L 2 from the handles L h , and the required 
labile moiety serves to provide a labile bond within the linker labile component. Thus, 
the linker reactant may be seen to have the general formula: 



L h -L»-L 2 -L 3 -L h 

The nomenclature used to describe the linker reactant may be illustrated 
in view of structure (iv), which again draws from the compound of structure (ii): 

Structure (iv) 



Linker 
Handle 




Linker 
Handle 



As structure (iv) illustrates, atoms may serve in more than one functional 
role. Thus, in structure (iv), the benzyl nitrogen functions as a chemical handle in 
20 allowing the linker reactant to join to the tag reactant via an amide-forming reaction, 
and subsequently also serves as a necessary part of the structure of the labile moiety L 2 
in that the benzylic carbon-nitrogen bond is particularly susceptible to photolytic 
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cleavage. Structure (iv) also illustrates that a linker reactant may have an L 3 group (in 
this case, a methylene group), although not have an L' group. Likewise, linker reactants 
may have an L' group but not an L 3 group, or may have L' and L 3 groups, or may have 
neither of L 1 nor L 3 groups. In structure (iv), the presence of the group "P" next to the 
5 carbonyl group indicates that the carbonyl group is protected from reaction. Given this 
configuration, the activated carboxyl group of the tag reactant (iii) may cleanly react 
with the amine group of the linker reactant (iv) to form an amide bond and give a 
compound of the formula T-L-L„. 

The MOI reactant is a suitably reactive form of a molecule of interest. 
10 Where the molecule of interest is a nucleic acid fragment a suitable MOI reactant is a 
nucleic acid fragment bonded through its 5' hydroxyl group to a phosphodiester group 
and then to an alkylene chain that terminates in an amino group. This amino group may 
then react with the carbonyl group of structure (iv), (after, of course, deprotecting the 
carbonyl group, and preferably after subsequently activating the carbonyl group toward 
1 5 reaction with the amine group) to thereby join the MOI to the linker. 

When viewed in a chronological order, the invention is seen to take a tag 
reactant (having a chemical tag handle and a tag variable component), a linker reactant 
(having two chemical linker handles, a required labile moiety and 0-2 optional labile 
moieties) and a MOI reactant (having a molecule of interest component and a chemical 
20 molecule of interest handle) to form T-L-MOI. Thus, to form T-L-MOI, either the tag 
reactant and the linker reactant are first reacted together to provide T-L-L h , and then the 
MOI reactant is reacted with T-L-L h so as to provide T-L-MOI, or else (less preferably) 
the linker reactant and the MOI reactant are reacted together first to provide L h -L-MOI, 
and then L^L-MOI is reacted with the tag reactant to provide T-L-MOI. For purposes 
25 of convenience, compounds having the formula T-L-MOI will be described in terms of 
the tag reactant, the linker reactant and the MOI reactant which may be used to form 
such compounds. Of course, the same compounds of formula T-L-MOI could be 
prepared by other (typically, more laborious) methods, and still fall within the scope of 
the inventive T-L-MOI compounds. 
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In any event, the invention provides that a T-L-MOI compound be 
subjected to cleavage conditions, such that a tag moiety is released from the remainder 
of the compound. The tag moiety will comprise at least the tag variable component, 
and will typically additionally comprise some or all of the atoms from the tag handle, 
5 some or all of the atoms from the linker handle that was used to join the tag reactant to 
the linker reactant, the optional labile moiety L 1 if this group was present in T-L-MOI, 
and will perhaps contain some part of the required labile moiety L 2 depending on the 
precise structure of L 2 and the nature of the cleavage chemistry. For convenience, the 
tag moiety may be referred to as the T-containing moiety because T will typically 
1 0 constitute the major portion (in terms of mass) of the tag moiety. 

Given this introduction to one aspect of the present invention, the 
various components T, L and X will be described in detail. This description begins with 
the following definitions of certain terms, which will be used hereinafter in describing 
T,LandX. 

15 As used herein, the term "nucleic acid fragment ,, means a molecule 

which is complementary to a selected target nucleic acid molecule (i.e., complementary 
to all or a portion thereof), and may be derived from nature or synthetically or 
recombinantly produced, including non-naturally occurring molecules, and may be in 
double or single stranded form where appropriate; and includes an oligonucleotide (e.g., 

20 DNA or RNA), a primer, a probe, a nucleic acid analog (e.g., PNA), an oligonucleotide 
which is extended in a 5' to 3' direction by a polymerase, a nucleic acid which is cleaved 
chemically or enzymatically, a nucleic acid that is terminated with a dideoxy terminator 
or capped at the 3' or 5* end with a compound that prevents polymerization at the 5* or 3' 
end, and combinations thereof. The complementarity of a nucleic acid fragment to a 

25 selected target nucleic acid molecule generally means the exhibition of at least about 
70% specific base pairing throughout the length of the fragment. Preferably the nucleic 
acid fragment exhibits at least about 80% specific base pairing; and most preferably at 
least about 90%. Assays for determining the percent mismatch (and thus the percent 
specific base pairing) are well known in the ait and are based upon the percent 

30 mismatch as a function of the Tm when referenced to the fully base paired control. 
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As used herein, the term "alkyl," alone or in combination, refers to a 
saturated, straight-chain or branched-chain hydrocarbon radical containing from 1 to 10, 
preferably from 1 to 6 and more preferably from 1 to 4, carbon atoms. Examples of 
such radicals include, but are not limited to, methyl, ethyl, n-propyl, iso-propyl, n-butyl, 
iso-butyl, sec-butyl, tert-butyl, pentyl, iso-amyl, hexyl, decyl and the like. The term 
"alkylene" refers to a saturated, straight-chain or branched chain hydrocarbon diradical 
containing from 1 to 10, preferably from 1 to 6 and more preferably from 1 to 4, carbon 
atoms. Examples of such diradicals include, but are not limited to, methylene, ethylene 
(-CH r CH 2 -), propylene, and the like. 

The term "alkenyl," alone or in combination, refers to a straight-chain or 
branched-chain hydrocarbon radical having at least one carbon-carbon double bond in a 
total of from 2 to 10, preferably from 2 to 6 and more preferably from 2 to 4, carbon 
atoms. Examples of such radicals include, but are not limited to, ethenyl, E- and 
Z-propenyl, isopropenyl, E- and Z-butenyl, E- and 2-isobutenyl, E- and Z-pentenyl. 
decenyl and the like. The term "alkenylene" refers to a straight-chain or branched-chain 
hydrocarbon diradical having at least one carbon-carbon double bond in a total of from 
2 to 10, preferably from 2 to 6 and more preferably from 2 to 4, carbon atoms. 
Examples of such diradicals include, but are not limited to, methylidene (=CH r ), 
ethylidene (-CH=CH-), propylidene (-CH 2 -CH=CH-) and the like. 

The term "alkynyl," alone or in combination, refers to a straight-chain or 
branched-chain hydrocarbon radical having at least one carbon-carbon triple bond in a 
total of from 2 to 10, preferably from 2 to 6 and more preferably from 2 to 4, carbon 
atoms. Examples of such radicals include, but are not limited to, ethynyl (acetylenyl), 
propynyl (propargyl), butynyl, hexynyl, decynyl and the like. The term "alkynylene", 
alone or in combination, refers to a straight-chain or branched-chain hydrocarbon 
diradical having at least one carbon-carbon triple bond in a total of from 2 to 10, 
preferably from 2 to 6 and more preferably from 2 to 4, carbon atoms. Examples of 
such radicals include, but are not limited, ethynylene (-GsC-), propynylene (-CH,- 
C=C-) and the like. 
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The term "cycloalkyl," alone or in combination, refers to a saturated, 
cyclic arrangement of carbon atoms which number from 3 to 8 and preferably from 3 to 
6, carbon atoms. Examples of such cycloalkyl radicals include, but are not limited to, 
cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl and the like. The term 
"cycloalkylene" refers to a diradical form of a cycloalkyl. 

The term "cycloalkenyl," alone or in combination, refers to a cyclic 
carbocycle containing from 4 to 8, preferably 5 or 6, carbon atoms and one or more 
double bonds. Examples of such cycloalkenyl radicals include, but are not limited to, 
cyclopentenyl, cyclohexenyl, cyclopentadienyl and the like. The term 
"cycloalkenylene" refers to a diradical form of a cycloalkenyl. 

The term "aryl" refers to a carbocyciic (consisting entirely of carbon and 
hydrogen) aromatic group selected from the group consisting of phenyl, naphthyl, 
indenyl, indanyl, azulenyl, fluorenyl, and anthracenyl; or a heterocyclic aromatic group 
selected from the group consisting of furyl, thienyl, pyridyl, pyrrolyl, oxazolyly, 
thiazolyl, imidazolyl, pyrazolyl, 2-pyrazolinyl, pyrazolidinyl, isoxazolyl, isothiazolyl, 1 . 
2, 3-oxadiazolyl, 1, 2, 3-triazolyl, 1, 3, 4-thiadiazolyl, pyridazinyl, pyrimidinyl, 
pyrazinyl, 1, 3, 5-triazinyl, 1, 3, 5-trithianyl, indolizinyl, indolyl, isoindolyl, 3H-indolyl. 
indolinyl, benzo[b]furanyl, 2, 3-dihydrobenzofuranyl, benzo[b]thiophenyl, 
lH-indazolyl, benzimidazolyl, benzthiazolyl, purinyl, 4H-quinolizinyl, quinolinyh 
isoquinolinyl, cinnolinyl, phthalazinyl, quinazolinyl, quinoxalinyl, 1, 8-naphthyridinyl, 
pteridinyl, carbazolyl, acridinyl, phenazinyl, phenothiazinyl, and phenoxazinyl. 

"Aryl" groups, as defined in this application may independently contain 
one to four substituents which are independently selected from the group consisting of 
hydrogen, halogen, hydroxyl, amino, nitro, trifluoromethyl, trifluoromethoxy, alkyl, 
alkenyl, alkynyl, cyano, carboxy, carboalkoxy, 1,2-dioxy ethylene, alkoxy, alkenoxy or 
alkynoxy, alkylamino, alkenylamino, alkynylamino, aliphatic or aromatic acyh 
alkoxy-carbonylamino, alkylsulfonylamino, morpholinocarbonylamino. 

thiomorpholinocarbonylamino, N-alkyI guanidino, aralkylaminosulfonyl; 
aralkoxyalkyl; N-aralkoxyurea; N-hydroxylurea; N-alkenylurea; N,N-(alkyl. 
hydroxyl)urea; heterocyclyl; thioaryloxy-substituted aryl; N,N-(aryl, alkyl)hydrazino; 
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Ar'-substituted sulfonylheterocyclyl; aralkyl-substituted heteiocyclyl; cycloalkyl and 
cycloakenyl-substituted heterocyclyl; cycloalkyl-fused aryl; aryloxy-substituted alkyl; 
heterocyclylamino; aliphatic or aromatic acylaminocarbonyl; aliphatic or aromatic 
acyl-substituted alkenyl; Ar'-substituted aminocarbonyloxy; Ar\ Ar'-disubstituted aryl; 
aliphatic or aromatic acyl-substituted acyl; cycloalkylcarbonylalkyl 
cycloalkyl-substituted amino; aryloxycarbonylalkyl; phosphorodiamidyj acid or ester; 

"Ar™ is a carbocyclic or heterocyclic aryl group as defined above having 
one to three substituents selected from the group consisting of hydrogen, halogen 
hydroxy!, amino, nitro, trifluoromethyl, trifluoromethoxy, alkyl, alkenyl, alkynyl 
1,2-dioxymethylene, 1,2-dioxyethylene, alkoxy, alkenoxy, alkynoxy, alkylamino' 
alkenylamino or alkynylamino, alkylcarbonyloxy, aliphatic or aromatic acyl 
alkylcarbonylamino, alkoxycarbonylamino, alkylsulfonylamino, N-alkyl or N.N-dialkyl 



urea. 
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The term "alkoxy," alone or in combination, refers to an alkyl ether 
radical, wherein the term "alkyl" is as defined above. Examples of suitable alkyl ether 
radicals include, but are not limited to, methoxy, ethoxy, n-propoxy, iso-propoxy, 
n-butoxy, iso-butoxy, sec-butoxy, tert-butoxy and the like. 

The term "alkenoxy," alone or in combination, refers to a radical of 
formula alkenyl-O-, wherein the term "alkenyl" is as defined above provided that the 
radical is not an enol ether. Examples of suitable alkenoxy radicals include, but are not 
limited to, allyloxy, E- and Z-3-methyI-2-propenoxy and the like. 

The term "alkynyloxy," alone or in combination, refers to a radical of 
formula alkynyl-O-, wherein the term "alkynyl" is as defined above provided that the 
radical is not an ynol ether. Examples of suitable alkynoxy radicals include, but are not 
limited to, propargyloxy, 2-butynyloxy and the like. 

The term "thioalkoxy" refers to a thioether radical of formula alkyl-S-, 
wherein alkyl is as defined above. 

The term "alkylamino," alone or in combination, refers to a mono- or 
di-alkyl-substituted amino radical (i.e., a radical of formula alkyl-NH- or (alkyl) 2 -N-) 
wherein the term "alkyl" is as defined above. Examples of suitable alkylamino radicals 



vsooar> <wo 



WO 97/27331 



PCT/US97/01304 



22 



include, but are not limited to, methylamino, ethylamino, propylamino, isopropylamino, 
t-butylamino, N,N-diethyIamino and the like. 

The term "alkenylamino," alone or in combination, refers to a radical of 
formula alkenyl-NH- or (a!kenyl) 2 N-, wherein the term "alkenyl" is as defined above, 
5 provided that the radical is not an enamine. An example of such alkenylamino radicals 
is the allylamino radical. 

The term "alkynylamino," alone or in combination, refers to a radical of 
formula alkynyl-NH- or (alkynyl) 2 N-, wherein the term "alkynyl" is as defined above, 
provided that the radical is not an ynamine. An example of such alkynylamino radicals 
1 0 is the propargyl amino radical. 

The term "amide" refers to either -N(R')-C(=0)- or -C(=0)-N(R ! )- 
where R 1 is defined herein to include hydrogen as well as other groups. The term 
"substituted amide" refers to the situation where R 1 is not hydrogen, while the term 
"unsubstituted amide" refers to the situation where R 1 is hydrogen. 
15 The term "aryloxy," alone or in combination, refers to a radical of 

formula aryl-O-, wherein aryl is as defined above. Examples of aryloxy radicals 
include, but are not limited to, phenoxy, naphthoxy, pyridyloxy and the like. 

The term "arylamino," alone or in combination, refers to a radical of 
formula aryl-NH-, wherein aryl is as defined above. Examples of arylamino radicals 
20 include, but are hot limited to, phenylamino (anilido), naphthylamino, 2-, 3- and 
4-pyridylamino and the like. 

The term "aryl-fused cycloalkyl," alone or in combination, refers to a 
cycloalkyl radical which shares two adjacent atoms with an aryl radical, wherein the 
terms "cycloalkyl" and "aryl" are as defined above. An example of an aryl-fused 
25 cycloalkyl radical is the benzofused cyclobutyl radical. 

The term "alkylcarbonylamino," alone or in combination, refers to a 
radical of formula alkyl-CONH, wherein the term "alkyl" is as defined above. 

The term "alkoxycarbonylamino," alone or in combination, refers to a 
radical of formula alkyl-OCONH-, wherein -the term "alkyl" is as defined above. 
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The term "alkylsulfonylamino," alone or in combination, refers to a 
radical of formula alkyl-S0 2 NH-, wherein the term "alkyl" is as defined above. 

The term "arylsulfonylamino," alone or in combination, refers to a 
radical of formula aryl-S0 2 NH-, wherein the term "aryl" is as defined above. 
5 The term "N-alkylurea," alone or in combination, refers to a radical of 

formula alkyl-NH-CO-NH-, wherein the term "alkyl" is as defined above. 

The term "N-arylurea," alone or in combination, refers to a radical of 
formula aryl-NH-CO-NH-, wherein the term "aryl" is as defined above. 

The term "halogen" means fluorine, chlorine, bromine and iodine. 
1 ° The term "hydrocarbon radical" refers to an arrangement of carbon and 

hydrogen atoms which need only a single hydrogen atom to be an independent stable 
molecule. Thus, a hydrocarbon radical has one open valence site on a carbon atom, 
through which the hydrocarbon radical may be bonded to other atom(s). Alkyl, alkenyl, 
cycloalky 1, etc. are examples of hydrocarbon radicals. 
15 1116 term "hydrocarbon diradical" refers to an arrangement of carbon and . 

hydrogen atoms which need two hydrogen atoms in order to be an independent stable 
molecule. Thus, a hydrocarbon radical has two open valence sites on one or two carbon , 
atoms, through which the hydrocarbon radical may be bonded to other atom(s). 
Alkylene, alkenylene, alkynylene, cycloalkylene, etc. are examples of hydrocarbon 
20 diradicals. 

The term "hydrocarbyl" refers to any stable arrangement consisting 
entirely of carbon and hydrogen having a single valence site to which it is bonded to 
another moiety, and thus includes radicals known as alkyl, alkenyl, alkynyl, cycloalkyl, 
cycloalkenyl, aryl (without heteroatom incorporation into the aryl ring), arylalkyl, 
25 alkylaryl and the like. Hydrocarbon radical is another name for hydrocarbyl. 

The term "hydrocarbylene" refers to any stable arrangement consisting 
entirely of carbon and hydrogen having two valence sites to which it is bonded to other 
moieties, and thus includes alkylene, alkenylene, alkynylene, cycloalkylene, 
cycloalkenylene, arylene (without heteroatom incorporation into the arylene ring), 
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arylalkylene, alkylarylene and the like. Hydrocarbon diradical is another name for 
hydrocarbylene. 

The term "hydrocarbyl-O-hydrocarbylene" refers to a hydrocarbyl group 
bonded to an oxygen atom, where the oxygen atom is likewise bonded to a 
5 hydrocarbylene group at one of the two valence sites at which the hydrocarbylene group 
is bonded to other moieties. The terms "hydrocarbyl-S-hydrocarbylene", "hydrocarbyl- 
NH-hydrocarbylene" and "hydrocarbyl-amide-hydrocarbylene" have equivalent 
meanings, where oxygen has been replaced with sulfur, -NH- or an amide group, 
respectively. 

10 The term N-(hydrocarbyl)hydrocarbylene refers to a hydrocarbylene 

group wherein one of the two valence sites is bonded to a nitrogen atom, and that 
nitrogen atom is simultaneously bonded to a hydrogen and a hydrocarbyl group. The 
term N,N-di(hydrocarbyl)hydrocarbylene refers to a hydrocarbylene group wherein one 
of the two valence sites is bonded to a nitrogen atom, and that nitrogen atom is 

1 5 simultaneously bonded to two hydrocarbyl groups. 

The term "hydrocarbylacyl-hydrocarbylene" refers to a hydrocarbyl 
group bonded through an acyl (-C(=0)-) group to one of the two valence sites of a 
hydrocarbylene group. 

The terms "heterocyclylhydrocarbyl" and "heterocylyl" refer to a stable, 

20 cyclic arrangement of atoms which include carbon atoms and up to four atoms (referred 
to as heteroatoms) selected from oxygen, nitrogen, phosphorus and sulfur. The cyclic 
arrangement may be in the form of a monocyclic ring of 3-7 atoms, or a bicyclic ring of 
8-1 1 atoms. The rings may be saturated or unsaturated (including aromatic rings), and 
may optionally be benzofused. Nitrogen and sulfur atoms in the ring may be in any 

25 oxidized form, including the quaternized form of nitrogen. A heterocyclylhydrocarbyl 
may be attached at any endocyclic carbon or heteroatom which results in the creation of 
a stable . structure. Preferred heterocyclylhydrocarbyls include 5-7 membered 
monocyclic heterocycles containing one or two nitrogen heteroatoms. 
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A substituted heterocyclylhydrocarbyl refers to a 
heterocyclylhydrocarbyl as defined above, wherein at least one ring atom thereof is 
bonded to an indicated substituent which extends off of the ring. 

In referring to hydrocarbyl and hydrocarbylene groups, the term 
5 "derivatives of any of the foregoing wherein one or more hydrogens is replaced with an 
equal number of fluorides" refers to molecules that contain carbon, hydrogen and 
fluoride atoms, but no other atoms. 

The term "activated ester" is an ester that contains a "leaving group" 
which is readily displaceable by a nucleophile, such as an amine, an alcohol or a thiol 
10 nucleophile. Such leaving groups are well known and include, without limitation, 
N-hydroxysuccinimide, N-hydroxybenzotriazole, halogen (halides), alkoxy including 
tetrafluorophenolates, thioalkoxy and the like. The term "protected ester" refers to an 
ester group that is masked or otherwise unreactive. See, e.g., Greene, "Protecting 
Groups In Organic Synthesis." 

In view of the above definitions, other chemical terms used throughout 
this application can be easily understood by those of skill in the art. Terms may be used 
alone or in any combination thereof. The preferred and more preferred chain lengths of 
the radicals apply to all such combinations. 

20 A - GENERATION OF TAfiOBn MTiCLETC Arm tre a ^„irxrrc 

As noted above, one aspect of the present invention provides a general 
scheme for DNA sequencing which allows the use of more than 16 tags in each lane; 
with continuous detection, the tags can be detected and the sequence read as the size 
separation is occurring, just as with conventional fluorescence-based sequencing. This 

25 scheme is applicable to any of the DNA sequencing techniques based on size separation 
of tagged molecules. Suitable tags and linkers for use within the present invention, as 
well as methods for sequencing nucleic acids, are discussed in more detail below. 



15 
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1. Tags 

"Tag", as used herein, generally refers to a chemical moiety which is 
used to uniquely identify a "molecule of interest", and more specifically refers to the tag 
variable component as well as whatever may be bonded most closely to it in any of the 
5 tag reactant, tag component and tag moiety. 

A tag which is useful in the present invention possesses several 

attributes: 

1) It is capable of being distinguished from all other tags. This 
discrimination from other chemical moieties can be based on the chromatographic 

10 behavior of the tag (particularly after the cleavage reaction), its spectroscopic or 
potentiometric properties, or some combination thereof. Spectroscopic methods by 
which tags are usefully distinguished include mass spectroscopy (MS), infrared (IR), 
ultraviolet (UV), and fluorescence, where MS, IR and UV are preferred, and MS most 
preferred spectroscopic methods. Potentiometric amperometry is a preferred 

1 5 potentiometric method. 

2) The tag is capable of being detected when present at 10~ 22 to 10* 6 

mole. 

3) The tag possesses a chemical handle through which it can be 
attached to the MOI which the tag is intended to uniquely identify. The attachment may 

20 be made directly to the MOI, or indirectly through a "linker" group. 

4) The tag is chemically stable toward all manipulations to which it 
is subjected, including attachment and cleavage from the MOI, and any manipulations 
of the MOI while the tag is attached to it. 

5) The tag does not significantly interfere with the manipulations 
25 performed on the MOI while the tag is attached to it. For instance, if the tag is attached 

to an oligonucleotide, the tag must not significantly interfere with any hybridization or 
enzymatic reactions (e.g., PCR sequencing reactions) performed on the oligonucleotide. 
Similarly, if the tag is attached to an antibody, it must not significantly interfere with 
antigen recognition by the antibody. 
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A tag moiety which is intended to be detected by a certain spectroscopic 
or potentiometric method should possess properties which enhance the sensitivity and 
specificity of detection by that method. Typically, the tag moiety will have those 
properties because they have been designed into the tag variable component which will 
5 typically constitute the major portion of the tag moiety. In the following discussion, the 
use of the word "tag" typically refers to the tag moiety {i.e., the cleavage product that 
contains the tag variable component), however can also be considered to refer to the tag 
variable component itself because that is the portion of the tag moiety which is typically 
responsible for providing the uniquely detectable properties. In compounds of the 
1 0 formula T-L-X, the "T* portion will contain the tag variable component. Where the tag 
variable component has been designed to be characterized by, e.g., mass spectrometry, 
the "T" portion of T-L-X may be referred to as I™. Likewise, the cleavage product 
from T-L-X that contains T may be referred to as the T ros -containing moiety. The 
following spectroscopic and potentiometric methods may be used to characterize T ms - 
1 5 containing moieties. 

a. Characteristics of MS Tags 

Where a tag is analyzable by mass spectrometry (i.e., is a MS-readable 
tag, also referred to herein as a MS tag or "^-containing moiety"), the essential 

20 feature of the tag is that it is able to be ionized. It is thus a preferred element in the 
design of MS-readable tags to incorporate therein a chemical functionality which can 
carry a positive or negative charge under conditions of ionization in the MS. This 
feature confers improved efficiency of ion formation and greater overall sensitivity of 
detection, particularly in electrospray ionization. The chemical functionality that 

25 supports an ionized charge may derive from T" or L or both. Factors that can increase 
the relative sensitivity of an analyte being detected by mass spectrometry are discussed 
in, e.g.. Sunner, J., etal, Anal. Chem. 60:1300-1307 (1988). 

A preferred functionality to facilitate the carrying of a negative charge is 
an organic acid, such as phenolic hydroxyl, carboxylic acid, phosphonate, phosphate. 

30 tetrazole. sulfonyl urea, perfluoro alcohol and sulfonic acid. 
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Preferred functionality to facilitate the carrying of a positive charge 
under ionization conditions are aliphatic or aromatic amines. Examples of amine 
functional groups which give enhanced detectability of MS tags include quaternary 
amines (i.e., amines that have four bonds, each to carbon atoms, see Aebersold, U.S. 
Patent No. 5,240,859) and tertiary amines (i.e., amines that have three bonds, each to 
carbon atoms, which includes C=N-C groups such as are present in pyridine, see Hess 
etal., Anal. Biochem. 224:373, 1995; Bures etal., Anal. Biochem. 224:364, 1995). 
Hindered tertiary amines are particularly preferred. Tertiary and quaternary amines may 
be alkyl or aryl. A T"»-containing moiety must bear at least one ionizable species, but 
may possess more than one ionizable species. The preferred charge state is a single 
ionized species per tag. Accordingly, it is preferred that each T^-containing moiety 
(and each tag variable component) contain only a single hindered amine or organic acid 
group. 

Suitable amine-containing radicals that may form part of the V s - 
containing moiety include the following: 

$-C^ ; I" < (^y>~ 0 -( C 2-C 1 o)-N(C 1 -C 10 ) 2 

<C«— C, 0 ) 

f^c.-c.o)-/^ ; HO ' 

|~<^N-(C 1 -C I0 ); |-(C,-C, 0 )-N^ 
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— CNH-(C,— C|0 )-Q^J| . — CNH-(C 2 -C 10 )-N O 



O 



o 



(( p~ c ' o) (c,— c 10 ) 

-CNH-(C 2 -C 10 )-N^) ; -CNH-(C,-C 10 )-Y N ^ 

-CNH-C^-C.o^NCC-Co^ ; — CNH-C^-Co^N^n . 

o « \ I' 





N(C,— C 10 ) ; and \^NH 

The identification of a tag by mass spectrometry is preferably based 
upon its molecular mass to charge ratio (m/z). The preferred molecular mass range of 
MS tags is from about 100 to 2,000 daltons, and preferably the T^-containing moiety 
has a mass of at least about 250 daltons, more preferably at least about 300 daltons, and 
still more preferably at least about 350 daltons. It is generally difficult for mass 
spectrometers to distinguish among moieties having parent ions below about 200-250 
daltons (depending on the precise instrument), and thus preferred ^-containing 
moieties of the invention have masses above that range. 

As explained above, the T^-containing moiety may contain atoms other 
than those present in the tag variable component, and indeed other than present in T~ 
itself. Accordingly, the mass of T» itself may be less than about 250 daltons, so long 
as the ^-containing moiety has a mass of at least about 250 daltons. Thus, the mass 
of T" may range from 15 (i.e., a methyl radical) to about 10,000 daltons, and 
15 preferably ranges from 100 to about 5,000 daltons, and more preferably ranges from 
about 200 to about 1 ,000 daltons. 

It is relatively difficult to distinguish tags by mass spectrometry when 
those tags incorporate atoms that have more than one isotope in significant abundance. 
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Accordingly, preferred T groups which are intended for mass spectroscopic 
identification (T ms groups), contain carbon, at least one of hydrogen and fluoride, and 
optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine. While 
other atoms may be present in the t ms y their presence can render analysis of the mass 

5 spectral data somewhat more difficult. Preferably, the T" 5 groups have only carbon, 
nitrogen and oxygen atoms, in addition to hydrogen and/or fluoride. 

Fluoride is an optional yet preferred atom to have in a T ms group. In 
comparison to hydrogen, fluoride is, of course, much heavier. Thus, the presence of 
fluoride atoms rather than hydrogen atoms leads to T" 15 groups of higher mass, thereby 

1 0 allowing the T ms group to reach and exceed a mass of greater than 250 daltons, which is 
desirable as explained above. In addition, the replacement of hydrogen with fluoride 
confers greater volatility on the ^-containing moiety, and greater volatility of the 
analyte enhances sensitivity when mass spectrometry is being used as the detection 
method. 

15 The molecular formula of 1™ falls within the scope of C 1 . 500 N 0 ., 00 O 0 . 

ioo s o-io p o-!oH a F pl6 wherein the sum of a, p and 5 is sufficient to satisfy the otherwise 
unsatisfied valencies of the C, N, O, S and P atoms. The designation C 1-500 N 0 ., 00 O 0 . 
lOoSo-ioPo-ioHaFp^ means that T" 15 contains at least one, and may contain any number 
from 1 to 500 carbon atoms, in addition to optionally containing as many as 100 

20 nitrogen atoms ("No_" means that T™ 8 need not contain any nitrogen atoms), and as 
many as 100 oxygen atoms, and as many as 10 sulfur atoms and as many as 10 
phosphorus atoms. The symbols a, p and 5 represent the number of hydrogen, fluoride 
and iodide atoms in Y ns 9 where any two of these numbers may be zero, and where the 
sum of these numbers equals the total of the otherwise unsatisfied valencies of the C, N, 

25 O, S and P atoms. Preferably, T™ has a molecular formula that falls within the scope of 
c i-50 N o-io°o-io H a F p where the sum of a and p equals the number of hydrogen and 
fluoride atoms, respectively, present in the moiety. 



NSDOC1D: <WO 



972733 1A2 IA> 



WO 97/27331 



PCT/US97/01304 



31 



b. Characteristics ofIR Tags 

There are two primary forms of IR detection of organic chemical groups: 
Raman scattering IR and absorption IR. Raman scattering IR spectra and absorption IR 
spectra are complementary spectroscopic methods. In general, Raman excitation 
5 depends on bond polarizability changes whereas IR absorption depends on bond dipole 
moment changes. Weak IR absorption lines become strong Raman lines and vice versa. 
Wavenumber is the characteristic unit for IR spectra. There are 3 spectral regions for IR 
tags which have separate applications: near IR at 12500 to 4000 cm" , mid IR at 4000 
to 600 cm", far IR at 600 to 30 cm". For the uses described herein where a compound 
10 is to serve as a tag to identify an MOI, probe or primer, the mid spectral regions would 
be preferred. For example, the carbonyl stretch (1 850 to 1 750 cm") would be measured 
for carboxylic acids, carboxylic esters and amides, and alkyl and aryl carbonates, 
carbamates and ketones. N-H bending (1750 to 160 cm") would be used to identify 
] amines, ammonium ions, and amides. At 1400 to 1250 cm", R-OH bending is detected 
15 as well as the C-N stretch in amides. Aromatic substitution patterns are detected at 900 
to 690 cm" (C-H bending. N-H bending for ArNH 2 ). Saturated C-H, olefins, aromatic 
rings, double and triple bonds, esters, acetals, ketals, ammonium salts, N-O compounds 
such as oximes, nitro, N-oxides, and nitrates, azo, hydrazones, quinones, carboxylic 
acids, amides, and lactams all possess vibrational infrared correlation data (see Pretsch 
20 et al., Spectral Data for Structure Determination of Organic Compounds, Springer- 
Verlag, New York, 1989). Preferred compounds would include an aromatic nitrile 
which exhibits a very strong nitrile stretching vibration at 2230 to 2210 cm". Other 
useful types of compounds are aromatic alkynes which have a strong stretching 
vibration that gives rise to a sharp absorption band between 2140 and 2100 cm". A 
25 third compound type is the aromatic azides which exhibit an intense absorption band in 
the 2160 to 2120 cm" region. Thiocyanates are representative of compounds that have 
a strong absorption at 2275 to 2263 cm" . 
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c. Characteristics of UV Tags 

A compilation of organic chromophore types and their respective UV- 
visible properties is given in Scott (Interpretation of the UV Spectra of Natural 
Products, Permagon Press, New York, 1962). A chromophore is an atom or group of 

5 atoms or electrons that are responsible for the particular light absorption. Empirical 
rules exist for the n to n* maxima in conjugated systems (see Pretsch et al. t Spectral 
Data for Structure Determination of Organic Compounds, p. B65 and B70, Springer- 
Verlag, New York, 1989). Preferred compounds (with conjugated systems) would 
possess n to n* and n to n* transitions. Such compounds are exemplified by Acid 

10 Violet 7, Acridine Orange, Acridine Yellow G, Brilliant Blue G, Congo Red, Crystal 
Violet, Malachite Green oxalate, Metanil Yellow, Methylene Blue, Methyl Orange, 
Methyl Violet B, Naphtol Green B, Oil Blue N, Oil Red O, 4-phenylazophenoh 
Safranie O, Solvent Green 3, and Sudan Orange G, all of which are commercially 
available (Aldrich, Milwaukee, WI). Other suitable compounds are listed in, e.g.. Jane, 

15 I.,etal.,J. Chrom. 323:191-225 (1985). 

d. Characteristic of a Fluorescent Tag 

Fluorescent probes are identified and quantitated most directly by their 
absorption and fluorescence emission wavelengths and intensities. Emission spectra 

20 (fluorescence and phosphorescence) are much more sensitive and permit more specific 
measurements than absorption spectra. Other photophysical characteristics such as 
excited-state lifetime and fluorescence anisotropy are less widely used. The most 
generally useful intensity parameters are the molar extinction coefficient (e) for 
absorption and the quantum yield (QY) for fluorescence. The value of e is specified at a 

25 single wavelength (usually the absorption maximum of the probe), whereas QY is a 
measure of the total photon emission over the entire fluorescence spectral profile. A 
narrow optical bandwidth (<20 run) is usually used for fluorescence excitation (via 
absorption), whereas the fluorescence detection bandwidth is much more variable, 
ranging from full spectrum for maxima) sensitivity to narrow band (-20 nm) for 

30 maximal resolution. Fluorescence intensity per probe molecule is proportional to the 
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product of e and QY. The range of these parameters among fluorophores of current 
practical importance is approximately 10,000 to 100,000 crn'M"' for e and 0.1 to 1 .0 for 
QY. Compounds that can serve as fluorescent tags are as follows: fluorescein, 
rhodamine, lambda blue 470, lambda green, lambda red 664, lambda red 665, acridine 
orange, and propidium iodide, which are commercially available from Lambda 
Fluorescence Co. (Pleasant Gap, PA). Fluorescent compounds such as nile red. Texas 
Red, lissamine™, BODIPY™ s are available from Molecular Probes (Eugene, OR). 

e. Characteristics of Poientiometric Tags 



The principle of electrochemical detection (ECD) is based on oxidation 
or reduction of compounds which at certain applied voltages, electrons are either 
donated or accepted thus producing a current which can be measured. When certain 
compounds are subjected to a potential difference, the molecules undergo a molecular 
15 rearrangement at the working electrodes' surface with the loss (oxidation) or gain 
(reduction) of electrons, such compounds are said to be electronic and undergo 
electrochemical reactions. EC detectors apply a voltage at an electrode surface over 
which the HPLC eluent flows. Electroactive compounds eluting from the column either 
donate electrons (oxidize) or acquire electrons (reduce) generating a current peak in real , 
20 time. Importantly the amount of current generated depends on both the concentration of 
the analyte and the voltage applied, with each compound having a specific voltage at 
which it begins to oxidize or reduce. The currently most popular electrochemical 
detector is the amperometric detector in which the potential is kept constant and the 
current produced from the electrochemical reaction is then measured. This type of 
25 spectrometry is currently called "potentiostatic amperometry" Commercial 
amperemeters are available from ESA, Inc., Chelmford, MA. 

When the efficiency of detection is 100%, the specialized detectors are 
termed "coulometric". Coulometric detectors are sensitive which have a number of 
practical advantages with regard to selectivity and sensitivity which make these types of 
30 detectors useful in an array. In coulometric detectors, for a given concentration of 
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analyte, the signal current is plotted as a function of the applied potential (voltage) to 
the working electrode. The resultant sigmoidal graph is called the current-voltage curve 
or hydrodynamic voltammagram (HDV). The HDV allows the best choice of applied 
potential to the working electrode that permits one to maximize the observed signal. A 
major advantage of ECD is its inherent sensitivity with current levels of detection in the 
subfemtomole range. 

Numerous chemicals and compounds are electrochemically active 
including many biochemicals, pharmaceuticals and pesticides. Chromatographically 
coeluting compounds can be effectively resolved even if their half- wave potentials (the 
potential at half signal maximum) differ by only 30-60 mV. 

Recently developed coulometric sensors provide selectivity, 
identification and resolution of co-eluting compounds when used as detectors in liquid 
chromatography based separations. Therefore, these arrayed detectors add another set of 
separations accomplished in the detector itself. Current instruments possess 16 channels 
which are in principle limited only by the rate at which data can be acquired. The 
number of compounds which can be resolved on the EC array is chromatographically 
limited (i.e., plate count limited). However, if two or more compounds that 
chromatographically co-elute have a difference in half wave potentials of 30-60 mV, 
the array is able to distinguish the compounds. The ability of a compound to be 
electrochemically active relies on the possession of an EC active group (i.e., -OH, -O, - 
N, -S). 

Compounds which have been successfully detected using coulometric 
detectors include 5-hydroxytryptamine, 3-methoxy-4-hydroxyphenyl-glycol, 
homogentisic acid, dopamine, metanephrine, 3 -hydroxy kynureninr, acetaminophen, 3- 
hydroxytryptophol, 5-hydroxyindoleacetic acid, octanesulfonic acid, phenol, o-cres\ 
pyrogallol, 2-nitrophenol, 4-nitrophenol, 2,4-dinitrophenol, 4,6-dinitrocresol, 3-meti - 

2- nitrophenol, 2,4-dichlorophenol, 2,6-dichlorophenol, 2,4,5-trichlorophenoL 4-chloro- 

3- methylphenol, 5-methylphenol, 4-methyl-2-nitrophenol, 2-hydroxyaniline, 4- 
hydroxyaniline, 1,2-phenylenediamine, benzocatechin, buturon, chlortholuron, diuron, 
isoproturon, linuron, methobromuron, metoxuron, monolinuron, monuron, methionine, 
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30 



tryptophan, tyrosine, 4-aminobenzoic acid, 4-hydroxybenzoic acid, 4-hydroxycoumaric 
acid. 7-methoxycoumarin, apigenin baicaJein, caffeic acid, catechin, centaurein, 
chlorogenic acid, daidzein, datiscetin, diosmetin, epicatechin gallate, epigallo catechin, 
epigallo catechin gallate, eugenol, eupatorin, ferulic acid, fisetin, galangin, gallic acid, 
gardenin, genistein, gentisic acid, hesperidin, irigenin, kaemferol, leucoyanidin, 
luteolin, mangostin, morin, myricetin, naringin, narinrtin, pelargondin, peonidin, 
phloretin, pratensein, protocatechuic acid, rhamnetin, quercetin, sakuranetin, 
scutellarein, scopoletin, syringaldehyde, syringic acid, tangeritin, troxerutin, 
umbelliferone, vanillic acid, 1,3-dimethyl tetrahydroisoquinoline, 6-hydroxydopamine, 
r-salsolinol, N-methyl-r-salsolinol, tetrahydroisoquinoline, amitriptyline, apomorphine, 
capsaicin, chlordiazepoxide, chlorpromazine, daunorubicin, desipramine, doxepin, 
fluoxetine, flurazepam, imipramine, isoproterenol, methoxamine, morphine, morphine- 

3- glucuronide, nortriptyline, oxazepam, phenylephrine, trimipramine, ascorbic acid, N- 
acetyl serotonin, 3,4-dihydroxybenzylamine, 3,4-dihydroxymandelic acid (DOMA), 
3,4-dihydroxyphenylacetic acid (DOPAC), 3,4-dihydroxyphenylalanine (L-DOPA), 
3,4-dihydroxyphenylglycol (DHPG), 3-hydroxyanthranilic acid, 2-hydroxyphenylacetic 
acid (2HPAC), 4-hydroxybenzoic acid (4HBAC), 5-hydroxyindole-3. acetic acid 
(5HIAA), 3-hydroxykynurenine, 3-hydroxymandelic acid, 3-hydroxy^l- 
methoxyphenylethylamine, 4-hydroxyphenylacetic acid (4HPAC), 

4- hydroxyphenyllactic acid (4HPLA), 5-hydroxytryptophan (5HTP). 5- 
hydroxytryptophol (5HTOL), 5-hydroxytryptaminc (5HT), 5-hydroxytryptamine 
sulfate, 3-methoxy-4-hydroxyphenylglycol (MHPG), 5-methoxytryptamine, 5- 
methoxytryptophan, 5-methoxytryptophol, 3-methoxytyramine (3MT), 3- 
methoxytyrosine (3-OM-DOPA), 5-methylcysteine, 3-methylguani„e, bufotenin, 
dopamine dopamine-3-glucuronide, dopamine-3-sulfate, dopamine-4-sulfate, 
epinephrine, epinine, folic acid, glutathione (reduced), guanine, guanosine, 
homogentisic acid (HGA), homovanillic acid (HVA), homovanillyl alcohol (HVOL), 
homoveratic acid, hva sulfate, hypoxanthine, indole, indole-3-acetic acid, indole-3- 
lactic acid, kynurenine, melatonin, metanephrine, N-methyltryptamine, N- 
methyltyramine, N,N-dimethyltryptamine, N,N-dimethyltyramine, norepinephrine, 
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normetanephrine, octopamine, pyridoxal, pyridoxal phosphate, pyridoxamine, 
synephrine, tryptophol, tryptamine, tyramine, uric acid, vanillylmandelic acid (vma), 
xanthine and xanthosine. Other suitable compounds are set forth in, e.g., Jane, I., et al. 
J. Chrom. 323:191-225 (1985) and Musch, G., et al., J. Chrom. 348:97-110 (1985). 
5 These compounds can be incorporated into compounds of formula T-L-X by methods 
known in the art. For example, compounds having a carboxylic acid group may be 
reacted with amine, hydroxyl, etc. to form amide, ester and other linkages between T 
and L. 

In addition to the above properties, and regardless of the intended 
10 detection method, it is preferred that the tag have a modular chemical structure. This 
aids in the construction of large numbers of structurally related tags using the 
techniques of combinatorial chemistry. For example, the 1™ group desirably has 
several properties. It desirably contains a functional group which supports a single 
ionized charge state when the ^-containing moiety is subjected to mass spectrometry 
1 5 (more simply referred to as a "mass spec sensitivity enhancer" group, or MSSE). Also, 
it desirably can serve as one member in a family of ^-containing moieties, where 
members of the family each have a different mass/charge ratio, however have 
approximately the same sensitivity in the mass spectrometer. Thus, the members of the 
family desirably have the same MSSE. In order to allow the creation of families of 
20 compounds, it has been found convenient to generate tag reactants via a modular 
synthesis scheme, so that the tag components themselves may be viewed as comprising 
modules. 

In a preferred modular approach to the structure of the T" 5 group, T ms 
has the formula 
25 T 2 -(J-T 3 -),,- 

wherein T 2 is an organic moiety formed from carbon and one or more of hydrogen, 
fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass range of 15 to 
500 daltons; T 3 is an organic moiety formed from carbon and one or more of hydrogen, 
30 fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass range of 50 to 
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1000 daltons; J is a direct bond or a functional group such as amide, ester, amine, 
sulfide, ether, thioester, disulfide, thioether, urea, thiourea, carbamate, thiocarbamate, 
Schiff base, reduced Schiff base, imine, oxime, hydrazone, phosphate, phosphonate, 
phosphoramide, phosphonamide, sulfonate, sulfonamide or carbon-carbon bond; and n 
5 is an integer ranging from 1 to 50, such that when n is greater than 1 , each T 3 and J is 
independently selected. 

The modular structure T 2 -(J-T 3 ) n - provides a convenient entry to families 
of T-L-X compounds, where each member of the family has a different T group. For 
instance, when T is T™ 5 , and each family member desirably has the same MSSE, one of 
10 the T 3 groups can provide that MSSE structure. In order to provide variability between 
members of a family in terms of the mass of 1™, the T 2 group may be varied among 
family members. For instance, one family member may have T 2 = methyl, while 
another has T J = ethyl, and another has T 2 = propyl, etc. 

In order to provide "gross" or large jumps in mass, a T 3 group may be 
15 designed which adds significant (e.g., one or several hundreds) of mass units to T-L-X. 
Such a T 3 group may be referred to as a molecular weight range adjuster 
group("WRA"). A WRA is quite useful if one is working with a single set of T 2 groups, , 
which will have masses extending over a limited range. A single set of T 2 groups may 
be used to create T» groups having a wide range of mass simply by incorporating one 
20 or more WRA T 3 groups into the T~. Thus, using a simple example, if a set of T 2 
groups affords a mass range of 250-340 daltons for the T", the addition of a single 
WRA, having, as an exemplary number 100 dalton, as a T 3 group provides access to the 
mass range of 350-440 daltons while using the same set of T 2 groups. Similarly, the 
addition of two 100 dalton MWA groups (each as a T 3 group) provides access to the 
25 mass range of 450-540 daltons, where this incremental addition of WRA groups can be 
continued to provide access to a very large mass range for the T» group. Preferred 
compounds of the formula T 2 -(J-T 3 -) n -L-X have the formula R V wKRwra)w-Rmsse-L-X 
where VWC is a "T 2 " group, and each of the WRA and MSSE groups are "T 3 " groups. 
This structure is illustrated in Figure 13, and represents one modular approach to the 
30 preparation of T mi . 



NSOOCID: <WO 9727331 A2_IA> 



WO 97/27331 



38 



PCT/US97/01304 



In the formula T 2 -(J-T 3 -) n -, T 2 and T 3 are preferably selected from 
hydrocarbyl, hydrocarbyl-O-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, 
hydrocarbyl-NH-hydrocarbylene, hydrocarbyl-amide-hydrocarbylene, N- 

(hydrocarbyl)hydrocarbylene, N,N-di(hydrocarbyl)hydrocarbylene, hydrocarbylacyl- 
5 hydrocarbylene, heterocyclylhydrocarbyl wherein the heteroatom(s) are selected from 
oxygen, nitrogen, sulfur and phosphorus, substituted heterocyclylhydrocarbyl wherein 
the heteroatom(s) are selected from oxygen, nitrogen, sulfur and phosphorus and the 
substituents are selected from hydrocarbyl, hydrocarbyl-Ohydrocarbylene, 
hydrocarbyl-NH-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, N- 

1 0 (hydrocarbyl)hydrocarbylene, N,N-di(hydrocarbyl)hydrocarbylene and 

hydrocarbylacyl-hydrocarbylene. In addition, T 2 and/or T 3 may be a derivative of any 
of the previously listed potential T 2 / T 3 groups, such that one or more hydrogens are 
replaced fluorides. 

Also regarding the formula T 2 -(J-T 3 -) n -, a preferred T 3 has the 
15 formula -G(R 2 )-, wherein G is C,^ alkylene chain having a single R 2 substituent. 
Thus, if G is ethylene (-CH 2 -CH r ) either one of the two ethylene carbons may have 
a R 2 substituent, and R 2 is selected from alkyl, alkenyl, alkynyl, cycloalkyl, 
aryl-fused cycloalkyl, cycloalkenyl, aryl, aralkyl, aryl-substituted alkenyl or 
alkynyl, cycloalkyl-substituted alkyl, cycloalkenyl-substituted cycloalkyl, biaryl, 
20 alkoxy, alkenoxy, alkynoxy, aralkoxy, aryl-substituted alkenoxy or alkynoxy, 
alkylamino, alkenylamino or alkynylamino, aryl-substituted alkylamino, 
aryl-substituted alkenylamino or alkynylamino, aryloxy, arylamino, 
N-alkylurea-substituted alkyl, N-arylurea-substituted alkyl, 

alkylcarbonylamino-substituted alkyl, aminocarbonyl-substituted alkyl, 
25 heterocyclyl, heterocyclyl-substituted alkyl, heterocyclyl-substituted amino, 
carboxyalkyl substituted aralkyl, oxocarbocyclyl-fused aryl and heterocyclylalkyl; 
cycloalkenyl, aryl-substituted alkyl and, aralkyl, hydroxy-substituted alkyl, alkoxy- 
substituted alkyl, aralkoxy-substituted alkyl, alkoxy-substituted alkyl, aralkoxy- 
substituted alkyl, amino-substituted alkyl, (aryl-substituted 

30 alkyloxycarbonylamino)-substituted alkyl, thiol-substituted alkyl, alkylsulfonyl- 
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substituted alkyl, (hydroxy-substituted alkylthio)-substituted alkyl. thioalkoxy- 
substituted alkyl, hydrocarbylacylamino-substituted alkyl, heterocyclylacylamino- 
substituted alkyl, hydrocarbyl-substituted-heterocyclylacylamino-substituted alkyl, 
alkylsulfonylamino-substituted alkyl, arylsulfonylamino-substituted alkyl, 
morpholino-alkyl, thiomorpholino-alkyl, morpholino carbonyl-substituted alkyl, 
thiomorpholinocarbonyl-substituted alkyl, [N-(alkyl, alkenyl or alkynyl)- or N,N- 
[dialkyl, dialkenyl, dialkynyl or (alkyl, alkenyD-aminoJcarbonyl-substituted alkyl, 
heterocyclylaminocarbonyl, heterocylylalkyleneaminocarbonyl. 
heterocyclylaminocarbonyl-substituted alkyl, heterocylylalkyleneaminocarbonyl- 
substituted alkyl, N,N.[dialkyl]alkyleneaminocarbonyl, N,N- 

[dialkyljalkyleneaminocarbonyl-substituted alkyl, alkyl-substituted 

heterocyclylcarbonyl, alkyl-substituted heterocyclylcarbonyl-alkyl, carboxyl- 
substituted alkyl, dialkylamino-substituted acylaminoalkyl and amino acid side 
chains selected from arginine, asparagine, glutamine, S-methyl cysteine, methionine 
and corresponding sulfoxide and sulfone derivatives thereof, glycine, leucine, 
isoleucine, allo-isoleucine, tert-leucine, norleucine, phenylalanine, tyrosine, 
tryptophan, proline, alanine, ornithine, histidine, glutamine, valine, threonine, 
serine, aspartic acid, beta-cyanoalanine, and allothreonine; alynyl and 
heterocyclylcarbonyl, aminocarbonyl, amidp, mono- or dialkylaminocarbonyl, 
mono- or diary laminocarbonyl, alkylarylaminocarbonyl, diarylaminocarbonyl, 
mono- or diacylaminocarbonyl, aromatic or aliphatic acyl, alkyl optionally 
substituted by substituents selected from amino, carboxy, hydroxy, mercapto, mono- 
or dialkylamino, mono- or diarylamino, alkylarylamino, diarylamino, mono- or 
diacylamino, alkoxy, alkenoxy, aryloxy, thioalkoxy, thioalkenoxy, thioalkynoxy, 
thioaryloxy and heterocyclyL 

A preferred compound of the formula T 2 -(J-T 3 -) n -L-X has the structure: 
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wherein G is (CH 2 ),^ such that a hydrogen on one and only one of the CH 2 groups 
represented by a single "G" is replaced with-(CH 2 ) < .-Amide-'T; T 2 and T are organic 
moieties of the formula C,. 2 jN 0 . 9 O 0 . 9 H a Fp such that the sum of a and p is sufficient to 
satisfy the otherwise unsatisfied valencies of the C, N, and O atoms; amide is 



O O 
II II 
— N-C — or — C-N — ; 
I , 1, 

R' is hydrogen or C M0 alkyl; c is an integer ranging 



R l R 1 



from 0 to 4; and n is an integer ranging from 1 to 50 such that when n is greater than 1, 
G, c, Amide, R 1 and T 4 are independently selected. 
10 In a further preferred embodiment, a compound of the formula T 2 -(J-T 3 - 

) n -L-X has the structure: 

T 4 
I 

Amide 
I 

(CH 2 ) C R . 0 
1 ' " X 



o iy»2fc r' o 



^ O (CH 2 ) C 
Amide 



wherein T 5 is an organic moiety of the formula C 125 No. 9 Oo. 9 H a F p such that the sum of a 
15 and (5 is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, and O 
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atoms; and V includes a tertiary or quaternary amine or an organic acid; m is an integer 
ranging from 0-49, and T\ T\ R', L and X have been previously defined. 

Another preferred compound having the formula T 2 -(J-T s -) n -L-X has the 



particular structure: 




wherein V is an organic moiety of the formula C.^CW^ such that the sum of a 
and p is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, and O 
atoms; and V includes a tertiary or quaternary amine or an organic acid; m is an' integer 
ranging from 0-49, and T\ T 4 , c, R', "Amide", L and X have been previously defined. 

In the above structures that have a T J group, -Amide-T J is preferably 
one of the following, which are conveniently made by reacting organic acids with free 
amino groups extending from "G": 




~ n Tt ; - NH ?-Q^-°- (C2 - c,o) - N(C '- c '^ 

(C,-C I0 ) ° 

N. 



-NHC-/ N-CC-Co); and -NHC-( C| -C 10 )-N- 
° O L 
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Where the above compounds have a T 5 group, and the "G" group has a 
free carboxyl group (or reactive equivalent thereof), then the following are preferred 
-Amide-T 5 group, which may conveniently be prepared by reacting the appropriate 
organic amine with a free carboxyl group extending from a "G" group: 



t-^y ; — CNH-^-C.o)-^^ 



6 o 



CNH-(C,— C )0 )— / V — CNH-(C 2 — C 10 )-N v O; 

(C,— C, 0 ) C.— C,o) 

/~r\ ...... .n. 




— CNH— (C 2 —C 10 )-N^ ^ ; — CNH— (C,— C 10 

— CNH— (C 2 — C 10 )-N(C,-C,o)2 ; — CNH— (C 2 — C 10 )— | ; 
O o 



-CN N(C,— C 10 ) ; and ^ NH 



O 




In three preferred embodiments of the invention, T-L-MOI has the 



structure: 



10 

or the structure 




(C ,— C 10 ) — ODN - 3— OH 
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T 4 
I 

Amide 



O (CH 2 ,H 

H o no, 




H 

/ 

O (C i C )0 ) — ODN — 3— OH 



or the structure: 




H 

/ 
N 

Vi- C 10 )— ODN— 3— OH 



wherein T J and V are organic moieties of the formula C mi ^ m O|mSmPmIVA «** 

that the sum of a, 0 and 6 is sufficient to satisfy the otherwise unsatisfied valencies of 

the C, N, O, S and P atoms; G is (CH 2 ) W wherein one and only one hydrogen on the 

CH 2 groups represented by each G is replaced with -(CH^-Amide-T 4 ; Amide is 

O O 
II II 
— N-C — or — C-N— ; 

R 1 is hydrogen or C,., 0 alkyl; c is an integer ranging 
from 0 to 4; "C 2 -C, 0 " represents a hydrocarbylene group having from 2 to 10 carbon 
atoms, "ODN-3'.OrT represents a nucleic acid fragment having a terminal 3' hydroxy! 
group (i.e., a nucleic acid fragment joined to (C,-C, 0 ) at other than the 3' end of the 
nucleic acid fragment); and n is an integer ranging from 1 to 50 such that when n is 
greater than 1 , then G, c, Amide, R 1 and T 4 are independently selected. Preferably there 
are not three heteroatoms bonded to a single carbon atom. 
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wherein T 2 and T 4 are organic moieties of the formula C^jN^O^H^p such that the 
sum of a and P is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, 
and O atoms; G is (CH 2 ),^ wherein one and only one hydrogen on the CH 2 groups 
represented by each G is replaced with-(CH 2 ) c -Amide-T 4 ; Amide is . 

II " 
— N-C — or — C-N — ; 
I , ' l 

5 R R 1 is hydrogen or C,. I0 alkyl; c is an integer ranging 

from 0 to 4; "ODN-3'-OH" represents a nucleic acid fragment having a terminal 3' 

hydroxyl group; and n is an integer ranging from 1 to 50 such that when n is greater 

than 1, G, c, Amide, R l and V are independently selected. 

In structures as set forth above that contain a T 2 -C(=0)-N(R')- group, 

1 0 this group may be formed by reacting an amine of the formula HN(R')- with an organic 

acid selected from the following, which are exemplary only and do not constitute an 

exhaustive list of potential organic acids: Formic acid, Acetic acid, Propiolic acid, 

Propionic acid, Fluoroacetic acid, 2-Butynoic acid, Cyclopropanecarboxylic acid, 

Butyric acid, Methoxyacetic acid, Difluoroacetic acid, 4-Pentynoic acid, 

15 Cyclobutanecarboxylic acid, 3,3-Dimethylacrylic acid, Valeric acid, N,N- 

Dimethylglycine, N-Formyl-Gly-OH, Ethoxyacetic acid, (Methylthio)acetic acid, 

Pyrrole-2-carboxylic acid, 3-Furoic acid, Isoxazole-5-carboxylic acid, trans-3-Hexenoic 

acid, Trifluoroacetic acid, Hexanoic acid, Ac-Gly-OH, 2-Hydroxy-2-methylbutyric 

acid; Benzoic acid, Nicotinic acid, 2-Pyrazinecarboxylic acid, l-Methyl-2- 

20 pyrrolecarboxylic acid, 2-Cyclopentene-l -acetic acid, Cyclopentylacetic acid, (S)-(-)-2- 

Pyrrolidone-5-carboxylic acid, N-Methyl-L-proline, Heptanoic acid, Ac-b-Ala-OH, 2- 

Ethyl-2-hydroxybutyric acid, 2-(2-Methoxyethoxy)acetic acid, p-Toluic acid, 6- 

Methylnicotinic acid, 5-Methyl-2-pyrazinecarboxylic acid, 2,5-Dimethylpyrrole-3- 

carboxylic acid, 4-Fluorobenzoic acid, 3,5-Dimethylisoxazole-4-carboxylic acid, 3- 

25 Cyclopentylpropionic acid, Octanoic acid, N,N-Dimethylsuccinamic acid, 

Phenylpropiolic acid, Cinnamic acid, 4-Ethylbenzoic acid, p- Anisic acid, 1,2,5- 

Trimethylpyrrole-3-carboxylic acid, 3-Fluoro-4-methylbenzoic acid, Ac-DL- 

Propargylglycine, 3-(Trifluoromethyl)butyric acid, 1-Piperidinepropionic acid, N- 
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Acetylproline, 3,5-Difluorobenzoic acid, Ac-L-Val-QH, Indo!e-2-carboxylic acid, 2- 
Benzofurancarboxylic acid, Benzotriazole-5-carboxyiic acid, 4-n-Propylbenzoic acid, 3- 
Dimethylaminobenzoic acid, 4-Ethoxybenzoic acid, 4-(Methylthio)benzoic acid, N-(2- 
Furoyl)glycine, 2-(Methylthio)nicotinic acid, 3-Fluoro-4-methoxybenzoic acid, Tfa- 
Gly-OH, 2-Napthoic acid, Quinaldic acid, Ac-L-Ile-OH, 3-Methylindene-2-carboxylic 
acid, 2-QuinoxalinecarboxyIic acid, l-Methylindole-2-carboxylic acid, 2,3,6- 
Trifluorobenzoic acid, N-Fonnyl-L-Met-OH, 2-[2-(2.Methoxyethoxy)ethoxyJac e tic 
acid, 4-n-Butylbenzoic acid, N-Benzoylglycine, 5-Fluoroindole-2-carboxylic acid. 4-n- 
Propoxybenzoic acid, 4-AcetyI-3,5-dimethyl.2-pym>Iecarboxylic acid, 3,5- 
Dimethoxybenzoic acid, 2,6-Dimethoxynicotinic acid, Cyclohexanepentanoic acid, 2- 
Naphthylacetic acid, 4-(lH-Pyrrol-l-yl)benzoic acid, IndoIe-3-propionic acid, m- 
Trifluoromethylbenzoic acid, 5-MethoxyindoIe-2-carboxylic acid, 4-Pentylbenzoic acid, 
Bz-b-Ala-OH, 4-Diethylaminobenzoic acid, 4-n-Butoxybenzoic acid, 3-Methyl-5-CF3- 
isoxazole-4-carboxylic acid, (3,4-Dimethoxyphenyl)acetic acid, 4-Biphenylcarboxylic 
acid, Pivaloyl-Pro-OH, Octanoyl-Gly-OH, (2-Naphthoxy)acetic acid, IndoIe-3-butyric 
acid, 4-(Trifluoromethyl) P henylacetic acid, 5-Methoxyindole-3-acetic acid, 4- 
(Trifluoromethoxy)benzoic acid, Ac-L-Phe-OH, 4-PentyIoxybenzoic acid, Z-Gly-OH, 
4-Carboxy-N-(fur-2-ylmethyl)pyrrolidin-2^ne, 3,4-Diethoxybenzoic acid, 2,4- 
Dimethyl-S-CO.Et-pyrroleO-carboxylic acid, N-(2-Fluorophenyl)succinamic acid, 
3,4,5-Trimethoxybenzoic acid, N-Phenylanthranilic acid, 3-Phenoxybenzoic acid. 
Nonanoyl-Gly-OH, 2-Phenoxypyridine-3-carboxylic acid, 2,5-Dimethyl-l- 
phenylpyitoIe-3-carboxyiic acid, trans^-<TrifluoromethyI)cinnamic acid, (5-Methyl-2- 
phenyloxazol-4-yDacetic acid, 4-(2-Cyclohexenyloxy)benzoic acid, 5-Methoxy-2- 
methylindole-3-acetic acid, trans-4-Cotininecarboxylic acid, Bz-5-Aminovaleric acid, 4- 
Hexyloxybenzoic acid, N-(3-Methoxyphenyl)succinamic acid, Z-Sar-OH, 4-(3,4- 
DimethoxyphenyDbutyric acid, Ac-o-FIuoro-DL-Phe-OH, N-(4- 

Fluorophenyl)glutaramic acid, 4 , -EthyI-4-biphenylcarboxylic acid, 1.2,3.4- 
Tetrahydroacridinecarboxylic acid, 3-Phenoxyphenylacetic acid, N-(2.4- 
Difluorophenyl)succinamic acid, N-Decanoyl-GIy-OH, (+)-6-Methoxy-a-methyl-2- 
naphthaleneacetic acid, 3-(Trifluoromethoxy)cinnamic acid, N-Formyl-DL-Trp-OH, 
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(R)-(+)-a-Methoxy-a-(trifluoromethyl)phenylacetic acid, Bz-DL-Leu-OH, 4- 
(Trifluoromethoxy)phenoxyacetic acid, 4-Heptyloxybenzoic acid, 2,3,4- 
Trimethoxycinnamic acid, 2,6-Dimethoxybenzoyl-Gly-OH, 3-(3,4,5- 

Trimethoxyphenyl)propionic acid, 2,3,4,5,6-Pentafluorophenoxyacetic acid, N-(2,4- 
5 Difluorophenyl)glutaramic acid, N-Undecanoyl-Gly-OH, 2-(4-FluorobenzoyI)benzoic 
acid, 5-Trifluoromethoxyindole-2-carboxyiic acid, N-(2,4-Difluorophenyl)diglycolamic 
acid, Ac-L-Trp-OH, Tfa-L-Phenylglycine-OH, 3-Iodobenzoic acid, 3-(4-n- 
Pentylbenzoyl)propionic acid, 2-Phenyl-4-quinolinecarboxylic acid, 4-Octyloxybenzoic 
acid, Bz-L-Met-OH, 3,4,5-Triethoxybenzoic acid, N-Lauroyl-Gly-OH, 3,5- 
10 Bis(trifluoromethyl)benzoic acid, Ac-5-Methyl-DL-Trp-OH, 2-Iodophenylacetic acid, 
3-Iodo-4-methylbenzoic acid, 3-(4-n-Hexylbenzoyl)propionic acid, N-Hexanoyl-L-Phe- 
OH, 4-Nonyloxybenzoic acid, 4XTrifluoromethyl)-2-biphenylcarboxylic acid, Bz-L- 
Phe-OH, N-Tridecanoyl-Gly-OH, 3,5-Bis(trifluoromethyl)phenylacetic acid, 3-(4-n- 
Heptylbenzoyl)propionic acid, N-Hepytanoyl-L-Phe-OH, 4-Decyloxybenzoic acid, N- 
15 (a,a,a-trifluoro-m-tolyl)anthranilic acid, Niflumic acid, 4-(2- 

Hydroxyhexafluoroisopropyl)benzoic acid, N-Myristoyl-Gly-OH, 3-(4-n- 
Octylbenzoyi)propionic acid, N-Octanoyl-L-Phe-OH, 4-Undecyloxybenzoic acid, 3- 
(3,4,5-Trimethoxyphenyl)propionyl-Gly-OH, 8-Iodonaphthoic acid, N-Pentadecanoyl- 
Gly-OH, 4-Dodecyloxybenzoic acid, N-Palmitoyl-Gly-OH, and N-Stearoyl-Gly-OH. 
20 These organic acids are available from one or more of Advanced ChemTech, Louisville, 
KY; Bachem Bioscience Inc., Torrance, CA; Calbiochem-Novabiochem Corp., San 
Diego, CA; Farchan Laboratories Inc., Gainesville FL; Lancaster Synthesis, Windham 
NH; and May Bridge Chemical Company (c/o Ryan Scientific), Columbia, SC. The 
catalogs from these companies use the abreviations which are used above to identify the 
25 acids. 

f Combinatorial Chemistry as a Means for Preparing Tags 
Combinatorial chemistry is a type of synthetic strategy which leads to 
the production of large chemical libraries (see, for example, PCT Application 
30 Publication No. WO 94/08051). These combinatorial libraries can be used as tags for 
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the identification of molecules of interest (MOIs). Combinatorial chemistry may be 
defined as the systematic and repetitive, covalent connection of a set of different 
"building blocks" of varying structures to each other to yield a large array of diverse 
molecular entities. Building blocks can take many forms, both naturally occurring and 
5 synthetic, such as nucleophiles, electrophiles, dienes, alkylating or acylating agents, 
diamines, nucleotides, amino acids, sugars, lipids, organic monomers, synthons, and 
combinations of the above. Chemical reactions used to connect the building blocks 
may involve alkylation, acylation, oxidation, reduction, hydrolysis, substitution, 
elimination, addition, cyclization, condensation, and the like. This process can produce 
0 libraries of compounds which are oligomeric, non-oligomeric, or combinations thereof. 
If oligomeric, the compounds can be branched, unbranched, or cyclic. Examples of 
oligomeric structures which can be prepared by combinatorial methods include 
oligopeptides, oligonucleotides, oligosaccharides, polylipids, polyesters, polyamides, 
polyurethanes, polyureas, polyethers, poly(phosphorus derivatives), e.g., phosphates, 
5 phosphonates, phosphoramides, phosphonamides, phosphites, phosphinamides, etc., and 
poly(sulfur derivatives), e.g., sulfones, sulfonates, sulfites, sulfonamides, sulfanamides, 



etc. 



One common type of oligomeric combinatorial library is the peptide 
combinatorial library. Recent innovations in peptide chemistry and molecular biology 
) have enabled libraries consisting of tens to hundreds of millions of different peptide 
sequences to be prepared and used. Such libraries can be divided into three broad 
categories. One category of libraries involves the chemical synthesis of soluble non- 
support-bound peptide libraries {e.g., Houghten et al., Nature 354X4, 1991). A second 
category involves the chemical synthesis of support-bound peptide libraries, presented 
on solid supports such as plastic pins, resin beads, or cotton (Geysen eta!., Mol. 
Immunol. 23:709, 1986; Lam etal., Nature 354:82, 1991; Eichler and Houghten, 
Biochemistry 52:11035, 1993). In these first two categories, the building blocks are 
typically L-amino acids, D-amino acids, unnatural amino acids, or some mixture or 
combination thereof. A third category uses molecular biology approaches to prepare 
peptides or proteins on the surface of filamentous phage particles or plasmids (Scon and 
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Craig, Curr. Opinion Biotech. 5:40, 1994). Soluble, nonsupport-bound peptide libraries 
appear to be suitable for a number of applications, including use as tags. The available 
repertoire of chemical diversities in peptide libraries can be expanded by steps such as 
permethylation (Ostresh et al., Proc. Natl. Acad ScL, USA 97:1 1 138, 1994). 
5 Numerous variants of peptide combinatorial libraries are possible in 

which the peptide backbone is modified, and/or the amide bonds have been replaced by 
mimetic groups. Amide mimetic groups which may be used include ureas, urethanes, 
and carbonylmethylene groups. Restructuring the backbone such that sidechains 
emanate from the amide nitrogens of each amino acid, rather than the alpha-carbons, 

10 gives libraries of compounds known as peptoids (Simon etaL, Proc Natl Acad ScL, 
USA $9:9367, 1992). 

Another common type of oligomeric combinatorial library is the 
oligonucleotide combinatorial library, where the building blocks are some form of 
naturally occurring or unnatural nucleotide or polysaccharide derivatives, including 

15 where various organic and inorganic groups may substitute for the phosphate linkage, 
and nitrogen or sulfur may substitute for oxygen in an ether linkage (Schneider et al., 
Biochem. 54:9599, 1995; Freier eta!., J. Med, Chem. 38:344, 1995; Frank, J. 
Biotechnology 41:259, 1995; Schneider etaL, Published PCT WO 942052; Ecker et ah, 
Nucleic Acids Res. 21 : 1 853, 1 993). 

20 More recently, the combinatorial production of collections of non- 

oligomeric, small molecule compounds has been described (DeWitt et al., Proc: Natl. 
Acad. ScL, USA 90:690, 1993; Bunin et al., Proc. Natl. Acad ScL, USA 97:4708, 1994). 
Structures suitable for elaboration into small-molecule libraries encompass a wide 
variety of organic molecules, for example heterocyclics, aromatics, alicyclics, 

25 aliphatics, steroids, antibiotics, enzyme inhibitors, ligands, hormones, drugs, alkaloids, 
opioids, terpenes, porphyrins, toxins, catalysts, as well as combinations thereof 

g. Specific Methods for Combinatorial Synthesis of Tags 
Two methods for the preparation and use of a diverse set of amine- 
30 containing MS tags are outlined below. In both methods, solid phase synthesis is 
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employed to enable simultaneous parallel synthesis of a large number of tagged linkers, 
using the techniques of combinatorial chemistry. In the first method, the eventual 
cleavage of the tag from the oligonucleotide results in liberation of a carboxyl amide. 
In the second method, cleavage of the tag produces a carboxylic acid. The chemical 
5 components and linking elements used in these methods are abbreviated as follows: 



10 



15 



R 

FMOC 

All 

C0 2 H 

CONHj 

NH 2 

OH 

CONH 

COO 

NH 2 - Rink - C0 2 H 

OH - lMeO - C0 2 H 
OH - 2MeO - C0 2 H 
NH 2 -A-COOH 

Xl....Xn-COOH 

oligol... oligo(n) 
HBTU 



resin 

fluorenylmethoxycarbonyl protecting group 

allyl protecting group 

carboxylic acid group 

carboxylic amide group 

amino group 

hydroxyl group 

amide linkage 

ester linkage 

4-[(a-amino>2,4-dimethoxybenzyl]-phenoxybutyric 
acid (Rink linker) 

(4-hydroxymethyl)phenoxybutyric acid 

(4-hydroxymethyl-3-methoxy)phenoxyaceticacid 

amino acid with aliphatic or aromatic amine 

functionality in side chain 

set of n diverse carboxylic acids with unique 

molecular weights 

set of n oligonucleotides 

O-benzotriazol- 1 -yl-N.N^N'-tetramethyluronium 
hexafluorophosphate 



The sequence of steps in Method 1 is as follows: 
OH - 2MeO - CONH - R 

I FMOC - NH - Rink - C0 2 H; couple (e.g., HBTU) 
FMOC - NH - Rink - COO - 2MeO - CONH - R 

4- piperidine (remove FMOC) 
NH 2 - Rink - COO - 2MeO - CONH - R 
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4 FMOC - NH - A - COOH; couple (e.g. , HBTU) 
FMOC - NH - A - CONH - Rink - COO - 2MeO - CONH - R 

5 

4 piperidine (remove FMOC) 
NH 2 - A - CONH - Rink - COO - 2MeO - CONH - R 

10 4 divide into n aliquots 

44444 couple to n different acids X 1 .... Xn - COOH 

XI . ... Xn - CONH - A - CONH - Rink - COO- 2MeO - CONH - R 

! 5 44444 Cleave tagged linkers from resin with 1 % TFA 

XI Xn -CONH -A -CONH- Rink -C0 2 H 

44444 couple to n oligos (oligol oligo(n)) 

20 (e.g., via Pfp esters) 

XI Xn - CONH - A - CONH - Rink - CONH - oligol oligo(n) 

4 pool tagged oligos 

25 4 perform sequencing reaction 

4 separate different length fragments from 

sequencing reaction (e.g. , via HPLC or CE) 
4 cleave tags from linkers with 25%-l 00% TFA 

30 XI Xn - CONH - A - CONH 

4 

analyze by mass spectrometry 

35 

The sequence of steps in Method 2 is as follows: 

OH - lMeO-COj- All 
40 4 FMOC - NH - A - C0 2 H; couple (e.g., HBTU) 

FMOC - NH - A - COO - lMeO - C0 2 - All 
45 4 Palladium (remove Allyl) 
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FMOC - NH - A - COO - 1 MeO - C0 2 H 

i OH - 2MeO - CONH - R; couple (e.g. , HBTU) 
FMOC - NH - A - COO - lMeO - COO - 2MeO - CONH - R 
i piperidine (remove FMOC) 
10 NH 2 -A-COO- 1 MeO - COO -2MeO- CONH -R 
I I , I ,f divide into n aliquots 

444444 couple to n different acids XI Xn - C0 2 H 

15 X1 ^-CONH-A-COO-lMeO-COO^MeO-CONH-R 

cleave tagged linkers from resin with 1 % TFA 

X1 Xn-CONH-A-COO- lMeO-CO,H 

20 2 

44444- couple to n oligos (oligol oligo(n)) 

(e.g., via Pfp esters) 

25 X1 *» - CONH - A - COO - IMeO - CONH - oligol oligo(n) 

| pool tagged oligos 

| perform sequencing reaction 

* separate different length fragments from 

30 | sequencing reaction (e.g. , via HPLC or CE) 

* cleave tags from linkers with 25-100% TFA 

XI Xn - CONH - A - C0 2 H 

4- 

35 

analyze by mass spectrometry 
2. Linkers 

A "linker" component (or L), as used herein, means either a direct 
40 covalent bond or an organic chemical group which is used to connect a "tag" (or T) to a 
"molecule of interest" (or MOI) through covalent chemical bonds. In addition, the 
direct bond itself, or one or more bonds within the linker component is cleavable under 
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conditions which allows T to be released (in other words, cleaved) from the remainder 
of the T-L-X compound (including the MOI component). The tag variable component 
which is present within T should be stable to the cleavage conditions. Preferably, the 
cleavage can be accomplished rapidly; within a few minutes and preferably within 
5 about 1 5 seconds or less. 

In general, a linker is used to connect each of a large set of tags to each 
of a similarly large set of MOls. Typically, a single tag-linker combination is attached 
to each MOI (to give various T-L-MOI), but in some cases, more than one tag-linker 
combination may be attached to each individual MOI (to give various (T-L)n-MOI). In 

10 another embodiment of the present invention, two or more tags are bonded to a single 
linker through multiple, independent sites on the linker, and this multiple tag-linker 
combination is then bonded to an individual MOI (to give various (T)n-L-MOI). 

After various manipulations of the set of tagged MOIs, special chemical 
and/or physical conditions are used to cleave one or more covalent bonds in the linker, 

15 resulting in the liberation of the tags from the MOIs. The cleavable bond(s) may or 
may not be some of the same bonds that were formed when the tag, linker, and MOI 
were connected together. The design of the linker will, in large part, determine the 
conditions under which cleavage may be accomplished. Accordingly, linkers may be 
identified by the cleavage conditions they are particularly susceptible too. When a 

20 linker is photolabile (z.e., prone to cleavage by exposure to actinic radiation), the linker 
may be given the designation L ho . Likewise, the designations L ac,d , L base , V°\ L lR V 
L cnz , L cIc , L A and L ss may be used to refer to linkers that are particularly susceptible to 
cleavage by acid, base, chemical oxidation, chemical reduction, the catalytic activity of 
an enzyme (more simply "enzyme"), electrochemical oxidation or reduction, elevated 

25 temperature ("thermal") and thiol exchange, respectively. 

Certain types of linker are labile to a single type of cleavage condition, 
whereas others are labile to several types of cleavage conditions. In addition, in linkers 
which are capable of bonding multiple tags (to give (T)n-L-MOI type structures), each 
of the tag-bonding sites may be labile to different cleavage conditions. For example, in 
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a linker having two tags bonded to it, one of the tags may be labile only to base, and the 
other labile only to photolysis. 

A linker which is useful in the present invention possesses several 

attributes: 

1 ) The linker possesses a chemical handle (L h ) through which it can be 
attached to an MOI. 

2) The linker possesses a second, separate chemical handle (1^,) through 
which the tag is attached to the linker. If multiple tags are attached to a single linker 
((T)n-L-MOI type structures), then a separate handle exists for each tag. 

3) The linker is stable toward all manipulations to which it is subjected, 
with the exception of the conditions which allow cleavage such that a T-containing 
moiety is released from the remainder of the compound, including the MOI. Thus, the 
linker is stable during attachment of the tag to the linker, attachment of the linker to the 

e the tag and linker (T-L) are attached to 



4) The linker does not significantly interfere with the manipulations 
performed on the MOI while the T-L is attached to it. For instance, if the T-L is 
attached to an oligonucleotide, the T-L must not significantly interfere with any 
hybridization or enzymatic reactions (e.g., PCR) performed on the oligonucleotide. 
Similarly, if the T-L is attached to an antibody, it must not significantly interfere with 
antigen recognition by the antibody. 

5) Cleavage of the tag from the remainder of the compound occurs in a 
highly controlled manner, using physical or chemical processes that do not adversely 
affect the detectability of the tag. 

For any given linker, it is preferred that the linker be attachable to a wide 
variety of MOls, and that a wide variety of tags be attachable to the linker. Such 
flexibility is advantageous because it allows a library of T-L conjugates, once prepared, 
to be used with several different sets of MOIs. 

As explained above, a preferred linker has the formula 



30 
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Lh-L'-L^-Lh 

wherein each L h is a reactive handle that can be used to link the linker to a tag reactant 
and a molecule of interest reactant. L 2 is an essential part of the linker, because L 
5 imparts lability to the linker. L ! and L 3 are optional groups which effectively serve to 
separate L 2 from the handles 1^,. 

V (which, by definition, is nearer to T than is L 3 ), serves to separate T 
from the required labile moiety L 2 . This separation may be useful when the cleavage 
reaction generates particularly reactive species (e.g., free radicals) which may cause 
10 random changes in the structure of the T-containing moiety. As the cleavage site is 
further separated from the T-containing moiety, there is a reduced likelihood that 
reactive species formed at the cleavage site will disrupt the structure of the T-containing 
moiety. Also, as the atoms in LI will typically be present in the T-containing moiety, 
these L 1 atoms may impart a desirable quality to the T-containing moiety. For example, 
15 where the T-containing moiety is a T^-containing moiety, and a hindered amine is 
desirably present as part of the structure of the T^-containing moiety (to serve, e.g., as 
a MSSE), the hindered amine may be present in L 1 labile moiety. 

In other instances, L 1 and/or L 3 may be present in a linker component 
merely because the commercial supplier of a linker chooses to sell the linker in a form 
20 having such a L 1 and/or L 3 group. In such an instance, there is no harm in using linkers 
having L 1 and/or L 3 groups, (so long as these group do not inhibit the cleavage reaction) 
even though they may not contribute any particular performance advantage to the 
compounds that incorporate them. Thus, the present invention allows for L 1 and/or L 

groups to be present in the linker component. 
25 L 1 and/or L 3 groups may be a direct bond (in which case the group is 

effectively not present), a hydrocarbylene group (e.g., alkylene, arylene, cycloalkylene, 
etc.), -O-hydrocarbylene (e.g., -0-CH r , 0-CH 2 CH(CH 3 >, etc.) or hydrocarbylene-(0- 
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hydrocarbylene) w - wherein w is an integer ranging from 1 to about 10 (e.g., -CH,-0-Ar- 
, -CH 2 -(0-CH2CH 2 ) 4 -, etc.). 

With the advent of solid phase synthesis, a great body of literature has 
developed regarding linkers that are labile to specific reaction conditions. In typical 
solid phase synthesis, a solid support is bonded through a labile linker to a reactive site, 
and a molecule to be synthesized is generated at the reactive site. When the molecule 
has been completely synthesized, the solid support-linker-molecule construct is 
subjected to cleavage conditions which releases the molecule from the solid support. 
The labile linkers which have been developed for use in this context (or which may be 
used in this context) may also be readily used as the linker reactant in the present 



10 

invention 



Lloyd-Williams, P., et al., "Convergent Solid-Phase Peptide Synthesis", 
Tetrahedron Report No. 347, 4P(48):1 1065-1 1 133 (1993) provides an extensive 
discussion of linkers which are labile to actinic radiation (i.e., photolysis), as well as 
15 acid, base and other cleavage conditions. Additional sources of information about labile 
linkers are well known in the art. 

As described above, different linker designs will confer cleavability 
("lability") under different specific physical or chemical conditions. Examples of 
conditions which serve to cleave various designs of linker include acid, base, oxidation, 
20 reduction, fluoride, thiol exchange, photolysis, and enzymatic conditions. 

Examples of cleavable linkers that satisfy the general criteria for linkers 
listed above will be well known to those in the art and include those found in the 
catalog available from Pierce (Rockford, IL). Examples include: 

• ethylene glycobis(succinimidylsuccinate) (EGS), an amine reactive 
25 cross-linking reagent which is cleavable by hydroxylamine (1 M at 37°C 

for 3-6 hours); 

• disuccinimidyl tartarate (DST) and sulfo-DST, which are amine reactive 
cross-linking reagents, cleavable by 0.015 M sodium periodate; 
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• bis[2-(succinimidyloxycarbonyloxy)ethyl]sulfone (BSOCOES) and 
sulfo-BSOCOES, which are amine reactive cross-linking reagents, 

cieavable by base (pH 1 1.6); 

• l,4-di-[3 , -(2'-pyridyldithio(propionamido))butane (DPDPB), a 
5 pyridyldithiol crosslinker which is cieavable by thiol exchange or 

reduction; 

• N-[4-(p-azidosalicylamido)-butyl]-3 , -(2'-pyridydithio)propionamide 
(APDP), a pyridyldithiol crosslinker which is cieavable by thiol 
exchange or reduction; 

10 • bis-[beta-4-(azidosalicylamido)ethyl]-disulfide, a photoreactive 

crosslinker which is cieavable by thiol exchange or reduction; 

• N-succinimidyl-(4-azidophenyl)-l,3'dithiopropionate (SADP), a 
photoreactive crosslinker which is cieavable by thiol exchange or 
reduction; 

15 # su lfosuccinimidyl-2-(7-azido-4-methylcoumarin-3-acetamide)ethyl-13 , - 

dithiopropionate (S AED), a photoreactive crosslinker which is cieavable 
by thiol exchange or reduction; 

• sulfosuccinimidyl-2-(m-azido-o-nitrobenzamido)-ethyl- 
1,3'dithiopropionate (SAND), a photoreactive crosslinker which is 

20 cieavable by thiol exchange or reduction. 

Other examples of cieavable linkers and the cleavage conditions that can 
be used to release tags are as follows. A silyl linking group can be cleaved by fluoride 
or under acidic conditions. A 3-, 4-, 5-, or 6-substituted-2-nitrobenzyloxy or 2-, 3-, 5-, 
or 6-substituted-4-nitrobenzyloxy linking group can be cleaved by a photon source 
25 (photolysis). A 3-, 4-, 5-, or 6-substituted-2-alkoxyphenoxy or 2-, 3-, 5-, or 6- 
substituted-4-alkoxyphenoxy linking group can be cleaved by Ce(NH 4 ) 2 (N0 3 ) 6 
(oxidation). A NC0 2 (urethane) linker can be cleaved by hydroxide (base), acid, or 
LiAlH 4 (reduction). A 3-pentenyl, 2-butenyl, or 1-butenyl linking group can be cleaved 
by O,, O s Cyi0 4 \ or KMn0 4 (oxidation). A 2-[3-, 4-, or 5-substituted-furylJoxy linking 
30 group can be cleaved by O,, Br Jt MeOH, or acid. 
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Conditions for the cleavage of other labile linking groups include: 
t-alkyloxy linking groups can be cleaved by acid; methyl(dialkyl)methoxy or 4- 
substituted-2-alkyl-l,3-dioxlane-2-yl linking ^ ^ cleaved by H O+; 

2-silylethoxy linking groups can be cleaved by fluoride or acid; 2-(X)-ethoxy (where 
X = keto, ester amide, cyano, NO,, sulfide, sulfoxide, sulfone) linking groups can be 
cleaved under alkaline conditions; 2-, 3-, 4-, 5-, or 6-substituted-benzyloxy linking 
groups can be cleaved by acid or under reductive conditions; 2-butenyloxy linking 
groups can be cleaved by (Ph 3 P),RhCl(H), 3-, 4-, 5-, or 6-subsututed-2-bromophenoxy 
linking groups can be cleaved by Li, Mg, or BuLi; methylthioinethoxy linking groups 
can be cleaved by Hg 2+ ; 2-(X)-ethyloxy (where X = a halogen) linking groups can be 
cleaved by Zn or Mg; 2-hydroxyethyloxy linking groups can be cleaved by oxidation 
(e.g., with Pb(OAc) 4 ). 

Preferred linkers are those that are cleaved by acid or photolysis. Several 
of the acid-labile linkers that have been developed for solid phase peptide synthesis are 
1 5 useful for linking tags to MOIs. Some of these linkers are described in a recent review 
by Lloyd-Williams etal. (Tetrahedron 49:11065-11133, 1993). One useful type of 
linker is based upon p-alkoxybenzyl alcohols, of which two, 4- 
hydroxymethylphenoxyacetic acid and 4-(4-hydroxymethyl-3-methoxyphenoxy)butyric 
acid, are commercially available from Advanced ChemTech (Louisville, KY). Both 
linkers can be attached to a tag via an ester linkage to the benzylalcohol. and to an 
amine-containing MOI via an amide linkage to the carboxylic acid. Tags linked by 
these molecules are released from the MOI with varying concentrations of 
trifluoroacetic acid. The cleavage of these linkers results in the liberation of a 
carboxylic acid on the tag. Acid cleavage of tags attached through related linkers, such 
as 2,4-dimethoxy-4'-(carboxymethyloxy)-benzhydrylamine (available from Advanced 
ChemTech in FMOC-protected form), results in liberation of a carboxylic amide on the 
released tag. 

The photolabile linkers useful for this application have also been for the 
most part developed for solid phase peptide synthesis (see Lloyd-Williams review). 
30 These linkers are usually based on 2-nitrobenzylesters or 2-nitrobenzylamides. Two 
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examples of photolabile linkers that have recently been reported in the literature are 4- 
(4-(l-Fmoc-amino)ethyl)-2-methoxy-5-nitrophenoxy)butanoic acid (Holmes and Jones, 
J. Org. Chem. 60:2318-2319, 1995) and 3-(Fmoc-amino)-3-(2-nitrophenyl)propionic 
acid (Brown et aL, Molecular Diversity 7:4-12, 1995). Both linkers can be attached via 
5 the carboxylic acid to an amine on the MOI. The attachment of the tag to the linker is 
made by forming an amide between a carboxylic acid on the tag and the amine on the 
linker. Cleavage of photolabile linkers is usually performed with UV light of 350 nm 
wavelength at intensities and times known to those in the art. Cleavage of the linkers 
results in liberation of a primary amide on the tag. Examples of photocleavable linkers 
10 include nitrophenyl glycine esters, exo- and endo-2-benzonorborneyl chlorides and 
methane sulfonates, and 3-amino-3(2-nitrophenyl) propionic acid. Examples of 
enzymatic cleavage include esterases which will cleave ester bonds, nucleases which 
will cleave phosphodiester bonds, proteases which cleave peptide bonds, etc. 

15 A preferred linker component has an ortho-nitrobenzyl structure as 

shown below: 



wherein one carbon atom at positions a, b, c, d or e is substituted with -L 3 -X, and L 
(which is preferably a direct bond) is present to the left of N(R l ) in the above structure. 

20 Such a linker component is susceptible to selective photo-induced cleavage of the bond 
between the carbon labeled "a" and N(R l ). The identity of R 1 is not typically critical to 
the cleavage reaction, however R ! is preferably selected from hydrogen and 
hydrocarbyl. The present invention provides that in the above structure, -N(R ! )- could 
be replaced with -O-. Also in the above structure, one or more of positions b, c, d or e 

25 may optionally be substituted with alkyl, alkoxy, fluoride, chloride, hydroxy!, 
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carboxylate or amide, where these substituents are independently selected at each 



occurrence. 



A farther preferred linker component with a chemical handle L h has the 
following structure: 




wherein one or more of positions b, c, d or e is substituted with hydrogen, alkyl, alkoxy, 
fluoride, chloride, hydroxyl, carboxylate or amide, R' is hydrogen or hydrocarbyl, and 
R 2 is -OH or a group that either protects or activates a carboxylic acid for coupling with 
another moiety. Fluorocarbon and hydrofluorocarbon groups are preferred groups that 
1 0 activate a carboxylic acid toward coupling with another moiety. 



15 



20 



25 



3 - Molecule of Interest (MOT) 

Examples of MOIs include nucleic acids or nucleic acid analogues (e.g., 
PNA), fragments of nucleic acids (i.e., nucleic acid fragments), synthetic nucleic acids 
or fragments, oligonucleotides (e.g., DNA or RNA), proteins, peptides, antibodies or 
antibody fragments, receptors, receptor ligands, members of a ligand pair, cytokines, 
hormones, oligosaccharides, synthetic organic molecules, drugs, and combinations 
thereof. 

Preferred MOIs include nucleic acid fragments. Preferred nucleic acid 
fragments are primer sequences that are complementary to sequences present in vectors, 
where the vectors are used for base sequencing. Preferably a nucleic acid fragment is 
attached directly or indirectly to a tag at other than the 3' end of the fragment; and most 
preferably at the 5' end of the fragment. Nucleic acid fragments may be purchased or 
prepared based upon genetic databases (e.g., Dib et al„ Nature 380:152-154, 1996 and 
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CEPH Genotype Database, http://www.cephb.fT) and commercial vendors (e.g., 
Promega, Madison, WI). 

As used herein, MOI includes derivatives of an MOI that contain 
functionality useful in joining the MOI to a T-L-L h compound. For example, a nucleic 

5 acid fragment that has a phosphodiester at the 5' end, where the phosphodiester is also 
bonded to an alkyleneamine, is an MOL Such an MOI is described in, e.g., U.S. Patent 
4,762,779 which is incorporated herein by reference. A nucleic acid fragment with an 
internal modification is also an MOI. An exemplary internal modification of a nucleic 
acid fragment is where the base (e.g., adenine, guanine, cytosine, thymidine, uracil) has 

10 been modified to add a reactive functional group. Such internally modified nucleic acid 
fragments are commercially available from, e.g., Glen Research, Herndon, VA. 
Another exemplary internal modification of a nucleic acid fragment is where an abasic 
phosphoramidate is used to synthesize a modified phosphodiester which is interposed 
between a sugar and phosphate group of a nucleic acid fragment. The abasic 

1 5 phosphoramidate contains a reactive group which allows a nucleic acid fragment that 
contains this phosphoramidate-derived moiety to be joined to another moiety, e.g., a T- 
L-L h compound. Such abasic phosphoramidates are commercially available from, e.g., 
Clonetech Laboratories, Inc., Palo Alto, CA. 

4. Chemical Handles (L^ ) 

20 

A chemical handle is a stable yet reactive atomic arrangement present as 
part of a first molecule, where the handle can undergo chemical reaction with a 
complementary chemical handle present as part of a second molecule, so as to form a 
covalent bond between the two molecules. For example, the chemical handle may be a 

25 hydroxy 1 group, and the complementary chemical handle may be a carboxylic acid 
group (or an activated derivative thereof, e.g., a hydrofluroaryl ester), whereupon 
reaction between these two handles forms a covalent bond (specifically, an ester group) 
that joins the two molecules together. 

Chemical handles may be used in a large number of covalent bond- 

30 forming reactions that are suitable for attaching tags to linkers, and linkers to MOIs. 
Such reactions include alkylation (e.g., to form ethers, thioethers), acylation (eg., to 
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form esters, amides, carbamates, ureas, thioureas), phosphorylation (e.g., to form 
phosphates, phosphonates, phosphoramides, phosphonamides), sulfonylation (e.g., to 
form sulfonates, sulfonamides), condensation (e.g., to form imines, oximes, 
hydrazones), silylation, disulfide formation, and generation of reactive intermediates, 
5 such as nitrenes or carbenes, by photolysis. In general, handles and bond-forming 
reactions which are suitable for attaching tags to linkers are also suitable for attaching 
linkers to MOIs, and vice-versa. In some cases, the MOI may undergo prior 
modification or derivitization to provide the handle needed for attaching the linker. 

One type of bond especially useful for attaching linkers to MOIs is the 

10 disulfide bond. Its formation requires the presence of a thiol group ("handle") on the 
linker, and another thiol group on the MOI. Mild oxidizing conditions then suffice to 
bond the two thiols together as a disulfide. Disulfide formation can also be induced by 
using an excess of an appropriate disulfide exchange reagent, e.g., pyridyl disulfides.. 
Because disulfide formation is readily reversible, the disulfide may also be used as the 

15 cleavable bond for liberating the tag, if desired. This is typically accomplished under 
similarly mild conditions, using an excess of an appropriate thiol exchange reagent, e.g. , 
dithiothreitol. 

Of particular interest for linking tags (or tags with linkers) to 
oligonucleotides is the formation of amide bonds. Primary aliphatic amine handles can 

20 be readily introduced onto synthetic oligonucleotides with phosphoramidites such as 6- 
monomethoxytritylhexylcyanoethyl-N,N-diisopropyl phosphoramidite (available from 
Glenn Research, Sterling, VA). The amines found on natural nucleotides such as 
adenosine and guanosine are virtually unreactive when compared to the introduced 
primary amine. This difference in reactivity forms the basis of the ability to selectively 

25 form amides and related bonding groups (e.g., ureas, thioureas, sulfonamides) with the 
introduced primary amine, and not the nucleotide amines. 

As listed in the Molecular Probes catalog (Eugene, OR), a partial 
enumeration of amine-reactive functional groups includes activated carboxylic esters, 
isocyanates, isothiocyanates, sulfonyl halides, and dichlorotriazenes Active esters are 

30 excellent reagents for amine modification since the amide products formed are very 
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stable. Also, these reagents have good reactivity with aliphatic amines and low 
reactivity with the nucleotide amines of oligonucleotides. Examples of active esters 
include N-hydroxysuccinimide esters, pentafluorophenyl esters, tetrafluorophenyl 
esters, and p-nitrophenyl esters. Active esters are useful because they can be made from 
5 virtually any molecule that contains a carboxylic acid. Methods to make active esters 
are listed in Bodansky {Principles of Peptide Chemistry (2d ed.), Springer Verlag, 
London, 1993). 

5. Linker Attachment 

10 

Typically, a single type of linker is used to connect a particular set or 
family of tags to a particular set or family of MOIs. In a preferred embodiment of the 
invention, a single, uniform procedure may be followed to create all the various T-L- 
MOl structures. This is especially advantageous when the set of T-L-MOI structures is 

15 large, because it allows the set to be prepared using the methods of combinatorial 
chemistry or other parallel processing technology. In a similar manner, the use of a 
single type of linker allows a single, uniform procedure to be employed for cleaving all 
the various T-L-MOI structures. Again, this is advantageous for a large set of T-L-MOI 
structures, because the set may be processed in a parallel, repetitive, and/or automated 

20 manner. 

There are, however, other embodiment of the present invention, wherein 
two or more types of linker are used to connect different subsets of tags to 
corresponding subsets of MOIs. In this case, selective cleavage conditions may be used 
to cleave each of the linkers independently, without cleaving the linkers present on 
25 other subsets of MOIs. 

A large number of covaient bond-forming reactions are suitable for 
attaching tags to linkers, and linkers to MOIs. Such reactions include alkylation (e.g.. 
to form ethers, thioethers), acylation (e.g., to form esters, amides, carbamates, ureas, 
thioureas), phosphorylation (e.g., to form phosphates, phosphonates, phosphoramides. 
30 phosphonamides), sulfonylation (e.g., to form sulfonates, sulfonamides), condensation 
(e.g., to form imines, oximes, hydrazones), silylation, disulfide formation, and 
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generation of reactive intermediates, such as nitrenes or carbenes, by photolysis. In 
general, handles and bond-forming reactions which are suitable for attaching tags to 
linkers are also suitable for attaching linkers to MOIs, and vice-versa. In some cases, 
the MOI may undergo prior modification or derivitization to provide the handle needed 
5 for attaching the linker. 

One type of bond especially useful for attaching linkers to MOIs is the 
disulfide bond. Its formation requires the presence of a thiol group ("handle") on the 
linker, and another thiol group on the MOI. Mild oxidizing conditions then suffice to 
bond the two thiols together as a disulfide. Disulfide formation can also be induced by 
10 using an excess of an appropriate disulfide exchange reagent, e.g., pyridyl disulfides. 
Because disulfide formation is readily reversible, the disulfide may also be used as the 
cleavable bond for liberating the tag, if desired. This is typically accomplished under 
similarly mild conditions, using an excess of an appropriate thiol exchange reagent, e.g., 
dithiothreitol. 

1 5 Of particular interest for linking tags to oligonucleotides is the formation 

of amide bonds. Primary aliphatic amine handles can be readily introduced onto * 
synthetic oligonucleotides with phosphoramidites such as 6- 
monomethoxytritylhexylcyanoethyl-N^-diisopropyl phosphoramidite (available from 
Glenn Research, Sterling, VA). The amines found on natural nucleotides such as 
20 adenosine and guanosine are virtually unreactive when compared to the introduced 
primary amine. This difference in reactivity forms the basis of the ability to selectively 
form amides and related bonding groups {e.g., ureas, thioureas, sulfonamides) with the 
introduced primary amine, and not the nucleotide amines. 

As listed in the Molecular Probes catalog (Eugene, OR), a partial 
25 enumeration of amine-reactive functional groups includes activated carboxylic esters, 
isocyanates, isothiocyanates, sulfonyl halides, and dichlorotriazenes. Active esters are 
excellent reagents for amine modification since the amide products formed are very 
stable. Also, these reagents have good reactivity with aliphatic amines and low 
reactivity with the nucleotide amines of oligonucleotides. Examples of active esters 
30 include N-hydroxysuccinimide esters, pentafluorophenyl esters, tetrafluorophenyl 
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esters, and p-nitrophenyl esters. Active esters are usefiil because they can be made from 
virtually any molecule that contains a carboxylic acid. Methods to make active esters 
are listed in Bodansky {Principles of Peptide Chemistry (2d ed.), Springer Verlag, 
London, 1993). 

5 Numerous commercial cross-linking reagents exist which can serve as 

linkers (e.g., see Pierce Cross-linkers, Pierce Chemical Co., Rockford, IL). Among 
these are homobifunctional amine-reactive cross-linking reagents which are exemplified 
by homobifunctional imidoesters and N-hydroxysuccinimidyl (NHS) esters. There also 
exist heterobifunctional cross-linking reagents possess two or more different reactive 
1 0 groups that allows for sequential reactions. Imidoesters react rapidly with amines at 
alkaline pH. NHS-esters give stable products when reacted with primary or secondary 
amines. Maleimides, alkyl and aryl halides, alpha-haloacyls and pyridyl disulfides are 
thiol reactive. Maleimides are specific for thiol (sulfhydryl) groups in the pH range of 
6.5 to 7.5, and at alkaline pH can become amine reactive. The thioether linkage is stable 
15 under physiological conditions. Alpha-haloacetyi cross-linking reagents contain the 
iodoacetyl group and are reactive towards sulfhydryls. Imidazoles can react with the 
iodoacetyl moiety, but the reaction is very slow. Pyridyl disulfides react with thiol 
groups to form a disulfide bond. Carbodiimides couple carboxyls to primary amines of 
hydrazides which give rises to the formation of an acyl-hydrazine bond. The arylazides 
20 are photoaffinity reagents which are chemically inert until exposed to UV or visible 
light. When such compounds are photolyzed at 250-460 run, a reactive aryl nitrene is 
formed. The reactive aryl nitrene is relatively non-specific. Glyoxals are reactive 
towards guanidinyl portion of arginine. 

In one typical embodiment of the present invention, a tag is first bonded 
25 to a linker, then the combination of tag and linker is bonded to a MOI, to create the 
structure T-L-MOI. Alternatively, the same structure is formed by first bonding a linker 
to a MOI, and then bonding the combination of linker and MOI to a tag. An example is 
where the MOI is a DNA primer or oligonucleotide. In that case, the tag is typically 
first bonded to a linker, then the T-L is bonded to a DNA primer or oligonucleotide, 
30 which is then used, for example, in a sequencing reaction. 
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One useful form in which a tag could be reversibly attached to an MOI 
(eg., an oligonucleotide or DNA sequencing primer) is through a chemically labile 
linker. One preferred design for the linker allows the linker to be cleaved when exposed 
to a volatile organic acid, for example, trifluoroacetic acid (TFA). TFA in particular is 
compatible with most methods of MS ionization, including electrospray. 

As described in detail below, the invention provides a method for 
determining the sequence of a nucleic acid molecule. A composition which may be 
formed by the inventive method comprises a plurality of compounds of the formula: 



15 



20 



T^L-MOI 

wherein T"* is an organic group detectable by mass spectrometry. T™ 
contains carbon, at least one of hydrogen and fluoride, and may contain optional atoms 
including oxygen, nitrogen, sulfur, phosphorus and iodine. In the formula, L is an 
organic group which allows a ^-containing moiety to be cleaved from the remainder 
of the compound upon exposure of the compound to cleavage condition. The cleaved 
^-containing moiety includes a functional group which supports a single ionized 
charge state when each of the plurality of compounds is subjected to mass spectrometry. 
The functional group may be a tertiary amine, quaternary amine or an organic acid. In 
the formula, MOI is a nucleic acid fragment which is conjugated to L via the 5' end of 
the MOI. The term "conjugated" means that there may be chemical groups intermediate 
L and the MOI, e.g., a phosphodiester group and/or an alkylene group. The nucleic acid 
fragment may have a sequence complementary to a portion of a vector, wherein the 
fragment is capable of priming nucleotide synthesis. 
25 In Ae composition, no two compounds have either the same T" 1 or the 

same MOI. In other words, the composition includes a plurality of compounds, wherein 
each compound has both a unique T» and a unique nucleic acid fragment (unique in 
that it has a unique base sequence). In addition, the composition may be described as 
having a plurality of compounds wherein each compound is defined as having a unique 
30 V s , where the T™ is unique in that no other compound has a T" that provides the 
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same signal by mass spectrometry. The composition therefore contains a plurality of 
compounds, each having a with a unique mass. The composition may also be 
described as having a plurality of compounds wherein each compound is defined as 
having a unique nucleic acid sequence. These nucleic acid sequences are intentionally 

5 unique so that each compound will serve as a primer for only one vector, when the 
composition is combined with vectors for nucleic acid sequencing. The set of 
compounds having unique Tms groups is the same set of compounds which has unique 
nucleic acid sequences. 

Preferably, the T^ groups are unique in that there is at least a 2 amu, 

10 more preferably at least a 3 amu, and still more preferably at least a 4 amu mass 
separation between the Y ns groups of any two different compounds. In the 
composition, there are at least 2 different compounds, preferably there are more than 2 
different compounds, and more preferably there are more than 4 different compounds. 
The composition may contain 100 or more different compounds, each compound having 

1 5 a unique T" 18 and a unique nucleic acid sequence. 

Another composition that is useful in, e.g., determining the sequence of a 
nucleic acid molecule, includes water and a compound of the formula T"VL-MOI, 
wherein T"* is an organic group detectable by mass spectrometry. T^ contains carbon, 
at least one of hydrogen and fluoride, and may contain optional atoms including 

20 oxygen, nitrogen, sulfur, phosphorus and iodine. In the formula, L is an organic group 
which allows a T^-containing moiety to be cleaved from the remainder of the 
compound upon exposure of the compound to cleavage condition. The cleaved T 115 - 
containing moiety includes a functional group which supports a single ionized charge 
state when each of the plurality of compounds is subjected to mass spectrometry. The 

25 functional group may be a tertiary amine, quaternary amine or an organic acid. In the 
formula, MOI is a nucleic acid fragment attached at its 5' end. 

In addition to water, this composition may contain a buffer, in order to 
maintain the pH of the aqueous composition within the range of about 5 to about 9. 
Furthermore, the composition may contain an enzyme, salts (such as MgCl 2 and NaCl) 

30 and one of dATP, dGTP, dCTP. and dTTP. A preferred composition contains water. 
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T" s -L-MOI and one (and only one) of ddATP, ddGTP, ddCTP, and ddTTP. Such a 
composition is suitable for use in the dideoxy sequencing method. 

The invention also provides a composition which contains a plurality of 
sets of compounds, wherein each set of compounds has the formula: 

5 T""-L-MOI 
wherein, 

1™ is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine. L is an organic group which allows a T-- 

10 containing moiety to be cleaved from the remainder of the compound, wherein the T"'- 
containing moiety comprises a functional group which supports a single ionized charge 
state when the compound is subjected to mass spectrometry and is selected from tertiary 
amine, quaternary amine and organic acid. The MOI is a nucleic acid fragment wherein 
L is conjugated to MOI at the MOI's 5' end. 

15 Within a set, all members have the same T mi group, and the MOI 

fragments have variable lengths that terminate with the same dideoxynucleotide 
selected from ddAMP, ddGMP, ddCMP and ddTMP; and between sets, the T- groups 
differ by at least 2 amu, preferably by at least 3 amu. The plurality of sets is preferably 
at least 5 and may number 100 or more. 

20 In a Preferred composition comprising a first plurality of sets as 

described above, there is additionally present a second plurality of sets of compounds 
having the formula 

T rai -L-MOI 

wherein 1™ is an organic group detectable by mass spectrometry, comprising carbon, at 
25 least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, 
sulfur, phosphorus and iodine. L is an organic group which allows a T">-containing 
moiety to be cleaved from the remainder of the compound, wherein the T" s -containing 
moiety comprises a functional group which supports a single ionized charge state when 
the compound is subjected to mass spectrometry and is selected from tertiary amine, 
30 quaternary amine and organic acid. MOI is a nucleic acid fragment wherein L is 
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conjugated to MOI at the MOFs 5' end. All members within the second plurality have 
an MOI sequence which terminates with the same dideoxynucleotide selected from 
ddAMP, ddGMP, ddCMP and ddTMP; with the proviso that the dideoxynucleotide 
present in the compounds of the first plurality is not the same dideoxynucleotide present 
5 in the compounds of the second plurality. 

The invention also provides a kit for DNA sequencing analysis. The kit 
comprises a plurality of container sets, where each container set includes at least five 
containers. The first container contains a vector. The second, third, fourth and fifth 
containers contain compounds of the formula: 

10 T^-L-MOI 

wherein T ms is an organic group detectable by mass spectrometry, comprising carbon, at 
least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, 
sulfur, phosphorus and iodine. L is an organic group which allows a T^-containing 
moiety to be cleaved from the remainder of the compound, wherein the T^-containing 

15 moiety comprises a functional group which supports a single ionized charge state when 
the compound is subjected to mass spectrometry and is selected from tertiary amine, 
quaternary amine and organic acid. MOI is a nucleic acid fragment wherein L is 
conjugated to MOI at the MOFs 5 v end. The MOI for the second, third, fourth and fifth 
containers is identical and complementary to a portion of the vector within the set of 

20 containers, and the 1™ group within each container is different from the other T" s 
groups in the kit. 

Preferably, within the kit, the plurality is at least 3, i.e., there are at least 
three sets of containers. More preferably, there are at least 5 sets of containers. 

25 As noted above, the present invention provides compositions and 

methods for determining the sequence of nucleic acid molecules. Briefly, such methods 
generally comprise the steps of (a) generating tagged nucleic acid fragments which are 
complementary to a selected nucleic acid molecule (e.g., tagged fragments) from a first 
terminus to a second terminus of a nucleic acid molecule), wherein a tag is correlative 

30 with a particular or selected nucleotide, and may be detected by any of a variety of 
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methods, (b) separating the tagged fragments by sequential length, (c) cleaving a tag 
from a tagged fragment, and (d) detecting the tags, and thereby determining the 
sequence of the nucleic acid molecule. Each of the aspects will be discussed in more 
detail below. 



B - SEQUENCIN G METHODS AND STR ATPHTPg 

As noted above, the present invention provides methods for determining 
the sequence of a nucleic acid molecule. Briefly, tagged nucleic acid fragments are 
prepared. The nucleic acid fragments are complementary to a selected target nucleic 
acid molecule. In a preferred embodiment, the nucleic acid fragments are produced 
from a first terminus to a second terminus of a nucleic acid molecule, and more 
preferably from a 5' terminus to a 3' terminus. In other preferred embodiments, the 
tagged fragments are generated from 5 -tagged oligonucleotide primers or tagged 
dideoxynucleotide terminators. A tag of a tagged nucleic acid fragment is correlative 
15 with a particular nucleotide and is detectable by spectrometry (including fluorescence, 
but preferably other than fluorescence), or by potentiometry. In a preferred 
embodiment, at least five tagged nucleic acid fragments are generated and each tag is 
unique for a nucleic acid fragment. More specifically, the number of tagged fragments 
will generally range from about 5 to 2,000. The tagged nucleic acid fragments may be 
20 generated from a variety of compounds, including those set forth above. It will be 
evident to one in the art that the methods of the present invention are not limited to use 
only of the representative compounds and compositions described herein. 

Following generation of tagged nucleic acid fragments, the tagged 
fragments are separated by sequential length. Such separation may be performed by a 
25 variety of techniques. In a preferred embodiment, separation is by liquid 
chromatography (LC) and particularly preferred is HPLC. Next, the tag is cleaved from 
the tagged fragment. The particular method for breaking a bond to release the tag is 
selected based upon the particular type of susceptibility of the bond to cleavage. For 
example, a light-sensitive bond (i.e., one that breaks by light) will be exposed to light. 
30 The released tag is detected by spectrometry or potentiometry. Preferred detection 
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means are mass spectrometry, infrared spectrometry, ultraviolet spectrometry and 
potentiostatic amperometry (e.g., with an amperometric detector or coulemetric 
detector). 

It will be appreciated by one in the art that one or more of the steps may 
5 be automated, e.g., by use of an instrument. In addition, the separation, cleavage and 
detection steps may be performed in a continuous manner (e.g., continuous 
flow/continuous fluid path of tagged fragments through separation to cleavage to tag 
detection). For example, the various steps may be incorporated into a system, such that 
the steps are performed in a continuous manner. Such a system is typically in an 

10 instrument or combination of instruments format. For example, tagged nucleic acid 
fragments that are separated (e.g., by HPLC) may flow into a device for cleavage (e.g., 
a photo-reactor) and then into a tag detector (e.g., a mass spectrometer or coulometric or 
amperometric detector). Preferably, the device for cleavage is tunable so that an 
optimum wavelength for the cleavage reaction can be selected. 

15 It will be apparent to one in the art that the methods of the present 

invention for nucleic acid sequencing may be performed for a variety of purposes. For 
example, such use of the present methods include primary sequence determination for 
viral, bacterial, prokaryotic and eukaryotic (e.g., mammalian) nucleic acid molecules; 
mutation detection; diagnostics; forensics; identity; and polymorphism detection. 

20 

1. Sequencing Methods 

As noted above, compounds including those of the present invention 
may be utilized for a variety of sequencing methods, including both enzymatic and 
chemical degradation methods. Briefly, the enzymatic method described by Sanger 

25 (Proc. Natl. Acad Sci. (USA) 74:5463, 1977) which utilizes dideoxy-terminators, 
involves the synthesis of a DNA strand from a single-stranded template by a DNA 
polymerase. The Sanger method of sequencing depends on the fact that that 
dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way 
a normal deoxynucleotides (albeit at a lower efficiency). However, ddNTPs differ from 

30 normal deoxynucleotides (dNTPs) in that they lack the 3-OH group necessary for chain 
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elongation. When a ddNTP is incorporated into the DNA chain, the absence the 3'- 
hydroxy group prevents the formation of a new phosphodiester bond and the DNA 
fragment is terminated with the ddNTP complementary to the base in the template 
DNA. The Maxam and Gilbert method (Maxam and Gilbert, Proc. Natl. Acad. Sci. 
5 (USA) 74:560, 1977) employs a chemical degradation method of the original DNA (in ; 
both cases the DNA must be clonal). Both methods produce populations of fragments 
that begin from a particular point and terminate in every base that is found in the DNA 
fragment that is to be sequenced. The termination of each fragment is dependent on the 
location of a particular base within the original DNA fragment. The DNA fragments 
10 are separated by polyacrylamide gel electrophoresis and the order of the DNA bases 
(A,C,T,G) is read from a autoradiograph of the gel. 

2 - Exonuclease DNA Sequencing 

A procedure for determining DNA nucleotide sequences was reported by 
15 Labeit et al. (S. Labeit, H. Lehrach & R S. Goody, DNA 5: 173-7, 1986; A new method 
of DNA sequencing using deoxynucleoside alpha-thiotriphosphates). In the first step of 
the method, four DNAs, each separately substituted with a different deoxynucleoside 
phosphorothioate in place of the corresponding monophosphate, are prepared by 
template-directed polymerization catalyzed by DNA polymerase. In the second step 
20 these DNAs are subjected to stringent exonuclease III treatment, which produces only 
fragments terminating with a phosphorothioate internucleotide linkage. These can then 
be separated by standard gel electrophoresis techniques and the sequence can be read 
directly as in presently used sequencing methods. Porter et al. (K. W. Porter, J. 
Tomasz, F. Huang, A. Sood & B. R. Shaw, Bioc hemistry 34: 11963-11969, 1995; N7- 
25 cyanoborane-2'-deoxyguanosine 5'-triphosphate is a good substrate for DNA 
polymerase) described a new set of boron-substituted nucleotide analogs which are also 
exonuclease resistant and good substrates for a number of polymerases: these base are 
also suitable for exonuclease DNA sequencing. 
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3. A Simplified Strategy for Sequencing Large Numbers of Full Length 
cDNAs. 

cDNA sequencing has been suggested as an alternative to generating the 
complete human genomic sequence. Two approaches have been attempted. The first 
5 involves generation of expressed sequence tags (ESTs) through a single DNA sequence 
pass at one end of each cDNA clone. This method has given insights into the 
distribution of types of expressed sequences and has revealed occasional useful 
homology with genomic fragments, but overall has added little to our knowledge base 
since insufficient data from each clone is provided. The second approach is to generate 

10 complete cDNA sequence which can indicate the possible function of the cDNAs. 
Unfortunately most cDNAs are of a size range of 1-4 kilobases which hinders the 
automation of full-length sequence determination. Currently the most efficient method 
for large scale, high throughput sequence production is from sequencing from a 
vector/primer site, which typically yields less than 500 bases of sequence from each 

15 flank. The synthesis of new oligonucleotide primers of length 15-18 bases for 'primer 
walking* can allow closure of each sequence. An alternative strategy for full length 
cDNA sequencing is to generate modified templates that are suitable for sequencing 
with a universal primer, but provide overlapping coverage of the molecules. 

Shotgun sequencing methods can be applied to cDNA sequencing 

20 studies by preparing a separate library from each cDNA clone. These methods have not 
been used extensively for the analysis of the 1.5 - 4.0 kilobase fragments, however, as 
they are very labor intensive during the initial cloning phase. Instead they have 
generally been applied to projects where the target sequence is of the order of 1 5 to 40 
kilobases, such as in lambda or cosmid inserts. 

25 

4. Analogy of cDNA with Genomic Sequencing 

Despite the typically different size of the individual clones to be 
analyzed in cDNA sequencing, there are similarities with the requirements for large 
scale genomic DNA sequencing. In addition to a low cost per base, and a high 
30 throughput, the ideal strategy for full length cDNA sequencing will have a high 
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accuracy. The favored current methodology for genomic DNA sequencing involves the 
preparation of shotgun sequencing libraries from cosmids, followed by random 
sequencing using ABI fluorescent DNA sequencing instruments, and closure (finishing) 
by directed efforts. Overall there is agreement that the fluorescent shotgun approach is 
5 superior to current alternatives in terms of efficiency and accuracy. The initial shotgun 
library quality is a critical determinant of the ease and quality of sequence assembly. 
The high quality of the available shotgun library procedure has prompted a strategy for 
the production of multiplex shotgun libraries containing mixtures of the smaller cDNA 
clones. Here the individual clones to be sequenced are mixed prior to library 

1 0 construction and then identified following random sequencing, at the stage of computer 
analysis. Junctions between individual clones are labeled during library production 
either by PCR or by identification of vector arm sequence. 

Clones may be prepared both by microbial methods or by PCR. When 
using PCR, three reactions from each clone are used in order to minimize the risk for 

1 5 errors. 

One pass sequencing is a new technique designed to speed the 
identification of important sequences within a new region of genomic DNA. Briefly, a 
high quality shotgun library is prepared and then the sequences sampled to obtain 80 
95% coverage. For a cosmid this would typically be about 200 samples. Essentially all 

20 genes are likely to have at least one exon detected in this sample using either sequence 
similarity (BLAST) or exon structure (GRAIL2) screening. 

"Skimming" has been successfully applied to cosmids and Pis. One 
pass sequencing is potentially the fastest and least expensive way to find genes in a 
positional cloning project. The outcome is virtually assured. Most investigators are 

25 currently developing cosmid contigs for exon trapping and related techniques. Cosmids 
are completely suitable for sequence skimming. PI and other BACs could be 
considerably cheaper since there is savings both in shotgun library construction and 
minimization of overlaps. 
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5 * Shotgun Sequencing 

n*,a „ Sh ° ,gUnDNAse< "' e,Ki "« M ^™tl. random fragmentation of the target 
DNA. ^omseau e „cingismen 1K ed,„ gs „e ra , ctnemajo ^ ofaiedata Adjrected 
Phase men complex gaps, ensuring coverage of each stiand in both direclions 
Shotgun sequencing offers me advantage of high accuracy a, relatively low cost Tie 
Procedure is best suited to the analysis of relatively ,arge flagmen* ^ is me me(hod 
of chotce in large scale genomic DNA sequencing. 

There am several factors that am importan, i„ raaldng shotglln 
sequencng accurate and cost effective. A major consideration is the quahty of me 
shotgun Hbrary tha, is generated, since any clones ma, do no. have inserts, or have 
chtmeric inserts, wiU resu., in subsequent inefficient sconcing. Anomer consideration 
- *e carefit. btdancing of me random and the directed phases of the sequencing, so that 
htgh accuracy is obtained with a minima. ,oss of efficiency mrnugh unnecessary 
sequencing. 



15 



6 Syncing Chemistry- T an *A^ mimtnr 

There are two types of fluorescent sequencing chemistries currently 
avertable: dye primer, where the primer is fluorescently labeled, and dye terminator 
where the dideoxy terminators are labeled. Each of these chemistries can be used with 
20 etther Taq DNA polymerase or sequenase enzyme, Sequenase enzyme seems to read 
easily through G-C rich regions, palindromes, simple repeats and other difficult to read 
sequences. Sequenase is also good for sequencing mixed population, Sequenase 
sequencmg requires 5 ug of template, one extension and a multi-step cleanup process 
Tagged-pnmer sequencing requires four separate reactions, one for each of A, C G and 
25 T and then a laborious cleanup protocol. Taq terminator cycle sequencing chemistry is 
the most robust sequencing method. With this method any sequencing primer can be 
used. The amount of template needed is relatively small and the whole reaction process 
from setup to cleanup is reasonably easy, compared to sequenase and dye primer 
chemtstrie, Only 1.5 ug of DNA template and 4 pm of primer are needed. To this a 
30 ready reaction mix is added. This mix consists of buffer, enzyme. dNTPs and labeled 
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dideoxynucleotides. This reaction can be done in one tube as each of the four dideoxies 
is labeled with a different fluorescent dye. These labeled terminators are present in this 
mix in excess because they are difficult to incorporate during extension. With unclean 
DNA the incorporation of these high molecular weight dideoxies can be inhibited. The 
5 premix includes dITP to minimize band compression. The use of Taq as the DNA 
polymerase allows the reactions to be run at high temperatures to minimize secondary 
structure problems as well as non-specific primer binding. The whole cocktail goes 
through 25 cycles of denaturation, annealing and extension in a thermal cycler and the 
completed reaction is spun through a Sephadex G50 (Pharmacia, Piscataway, NJ) 
10 column and is ready for gel loading after five minutes in a vacuum dessicator. 

7. Designing Primers 

When designing primers, the same criteria should be used as for 
designing PCR primers. In particular, primers should preferably be 1 8 to 20 nucleotides 
15 long and the 3-prime end base should be a G or a C. Primers should also preferably 
have a Tm of more than 50°C. Primers shorter than 18 nucleotides will work but are 
not recommended. The shorter the primer the greater the probability of it binding at 
more than one site on the template DNA, and the lower its Tm. The sequence should 
have 100% match with the template. Any mismatch, especially towards the 3-prime end 
20 will greatly diminish sequencing ability. However primers with 5-prime tails can be 
used as long as there is about 18 bases at 3-prime that bind. If one is designing a primer 
from a sequence chromatogram, an area with high confidence must be used. As one 
moves out.past 350 to 400 bases on a standard chromatogram, the peaks get broader and 
the base calls are not as accurate. As described herein, the primer may possess a 5' 
25 handle through which a linker or linker tag may be attached. 



8 - Nucleic Ac id Temnlate Preparation 

The most important factor in tagged-primer DNA sequencing is the 
quality of the template. Briefly, one common misconception is that if a template works 
30 in manual sequencing, it should work in automated sequencing. In fact, if a reaction 
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works in manual sequencing it may work in automated sequencing, however, automated 
sequencing is much more sensitive and a poor quality template may result in little or no 
data when fluorescent sequencing methods are utilized. High salt concentrations and 
other cell material not properly extracted during template preparation, including RNA, 
5 may likewise prevent the ability to obtain accurate sequence information. Many mini 
and maxi prep protocols produce DNA which is good enough for manual sequencing or 
PCR, but not for automated (tagged-primer) sequencing. Also the use of phenol is not 
at all recommended as phenol can intercalate in the helix structure. The use of 100% 
chloroform is sufficient. There are a number of DNA preparation methods which are 

10 particularly preferred for the tagged primer sequencing methods provided herein. In 
particular, maxi preps which utilize cesium chloride preparations or Qiagen 
(Chatsworth, CA) maxi prep, columns (being careful not to overload) are preferred. For 
mini preps, columns such as Promega's Magic Mini prep (Madison, WI), may be 
utilized. When sequencing DNA fragments such as PCR fragments or restriction cut 

1 5 fragments, it is generally preferred to cut the desired fragment from a low melt argarose 
gel and then purify with a product such as GeneClean (La Jolla, CA). It is very 
important to make sure that only one band is cut from the gel. For PCR fragments the 
PCR primers or internal primers can be used in order to ensure that the appropriate 
fragment was sequenced. To get optimum performance from the sequence analysis 

20 software, fragments should be larger than 200 bases. Double stranded or single 
stranded DNA can be sequenced by this method. 

An additional factor generally taken into account when preparing DNA 
for sequencing is the choice of host strain. Companies selling equipment and reagents 
for sequencing, such as ABI (Foster City, CA) and Qiagen (Chatsworth, CA), typically 

25 recommend preferred host strains, and have previously recommended strains such as 
DH5 alpha, HB101, XL-1 Blue, JM109, MV1190. Even when the DNA preparations 
are very clean, there are other inherent factors which can make it difficult to obtain 
sequence. G-C rich templates are always difficult to sequence through, and secondary 
structure can also cause problems. Sequencing through a long repeats often proves to 

30 be difficult. For instance as Taq moves along a poly T stretch, the enzyme often falls 



^SOOCID: <WO 



_9727331A2_IA> 



5 



WO 97/27331 PCTAJS97/01304 

77 

off the template and jumps back on again, skipping a T. This results in extension 
products with X amount of Ts in the poly T stretch and fragments with X-l, X-2 etc. 
amounts of Ts in the poly T stretch. The net effect is that more than one base appears in 
each position making the sequence impossible to read. 



9 - Use of Molecularlv Dis tinct Cloninp Vectors 

Sequencing may also be accomplished utilizing universal cloning vector 
(M13) and complementary sequencing primers. Briefly, for present cloning vectors the 
same primer sequence is used and only 4 tags are employed (each tag is a different 
10 fluorophore which represents a different terminator (ddNTP)), every amplification 
process must take place in different containers (one DNA sample per container). That 
is, it is impossible to mix two or different DNA samples in the same amplification 
process. With only 4 tags available, only one DNA sample can be run per gel lane. 
There is no convenient means to deconvolve the sequence of more than one DNA 
1 5 sample with only 4 tags. (In this regard, workers in the field take great care not to mix 
or contaminate different DNA samples when using current technologies.) 

A substantial advantage is gained when multiples of 4 tags can be run 
per gel lane or respective separation process. In particular, utilizing tags of the present 
invention, more than one DNA sample in a single amplification reaction or container 
20 can be processed. When multiples of 4 tags are available for use, each tag set can be 
assigned to a particular DNA sample that is to be amplified. (A tag set is composed of a 
series of 4 different tags each with a unique property. Each tag is assigned to represent 
a different dideoxy-terminator, ddATP, ddGTP, ddCTP, or ddTTP. To employ this 
advantage a series of vectors must be generated in which a unique priming site is 
25 inserted. A unique priming site is simply a stretch of 1 8 nucleotides which differs from 
vector to vector. The remaining nucleotide sequence is conserved from vector to vector. 
A sequencing primer is prepared (synthesized) which corresponds to each unique 
vector. Each unique primer is derived (or labelled) with a unique tag set. 

With these respective molecular biology tools in hand, it is possible in 
30 the present invention to process multiple samples in a single container. First. DNA 
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samples which are to be sequenced are cloned into the multiplicity of vectors. For 
example, if 100 unique vectors are available, 100 ligation reactions, plating steps, and 
picking of plaques are performed. Second, one sample from each vector type is pooled 
making a pool of 100 unique vectors containing 100 unique DNA fragments or samples. 

5 A given DNA sample is therefore identified and automatically assigned a primer set 
with the associated tag set. The respective primers, buffers, polymerase(s), ddNTPs, 
dNTPs and co-factors are added to the reaction container and the amplification process 
is carried out. The reaction is then subjected to a separation step and the respective 
sequence is established from the temporal appearance of tags. The ability to pool 

10 multiple DNA samples has substantial advantages. The reagent cost of a typical PCR 
reaction is about $2.00 per sample. With the method described herein the cost of 
amplification on a per sample basis could be reduced at least by a factor of 100. Sample 
handling could be reduced by a factor of at least 100, and materials costs could be 
reduced. The need for large scale amplification robots would be obviated. 

15 

10. Sequencing Vectors for Cleavable Mass Spectro scopy Tagging 

Using cleavable mass spectroscopy tagging (CMST) of the present 
invention, each individual sequencing reaction can be read independently and 
simultaneously as the separation proceeds. In CMST sequencing, a different primer is 

20 used for each cloning vector: each reaction has 20 different primers when 20 clones are 
used per pool. Each primer corresponds to one of the vectors, and each primer is tagged 
with a unique CMST molecule. Four reactions are performed on each pooled DNA 
sample (one for each base), so every vector has four oligonucleotide primers, each one 
identical in sequence but tagged with a different CMS tag. The four separate 

25 sequencing reactions are pooled and run together. When 20 samples are pooled, 80 tags 
are used (4 bases per sample times 20 samples), and all 80 are detected simultaneously 
as the gel is run. 

The construction of the vectors may be accomplished by cloning a 
random 20-mer on either side of a restriction site. The resulting clones are sequenced 
30 and a number chosen for use as vectors. Two oligonucleotides are prepared for each 
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vector chosen, one homologous to the sequence at each side of the restriction site, and 
each orientated so that the 3'-end is towards the restriction site. Four tagged 
preparations of each primer are prepared, one for each base in the sequencing reactions 
and each one labeled with a unique CMS tag. 



5 



10 



H - Advantages of Seouencin p bv the Use of Reversible Tap s 

There are substantial advantages when cleavable tags are used in 
sequencing and related technologies. First, an increase in sensitivity will contribute to 
longer read lengths, as will the ability to collect tags for a specified period of time prior 
to measurement The use of cleavable tags permits the development of a system that 
equalizes bandwidth over the entire range of the gel (1-1500 nucleotides (nt), for 
example). This will greatly impact the ability to obtain read lengths greater than 450 nt. 

The use of cleavable multiple tags (MW identifiers) also has the 
advantage that multiple DNA samples can be run on a single gel lane or separation 
15 process. For example, it is possible using the methodologies disclosed herein to 
combine at least 96 samples and 4 sequencing reactions (A,G,T,C) on a single lane or 
fragment sizing process. If multiple vectors are employed which possess unique 
priming sites, then at least 384 samples can be combined per gel lane (the different 
terminator reactions cannot be amplified together with this scheme). When the ability 
20 to employ cleavable tags is combined with the ability to use multiple vectors, an 
apparent 10,000-fold increase in DNA sequencing thoughput is achieved. Also, in the 
schemes described herein, reagent use is decreased, disposables decrease, with a 
resultant decrease in operating costs to the consumer. 

An additional advantage is gained from the ability to process internal 
25 controls throughout the entire methodologies described here. For any set of samples, an 
internal control nucleic acid can be placed in the sample(s). This is not possible with 
the current configurations. This advantage permits the control of the amplification 
process, the separation process, the tag detection system and sequence assembly. This 
is an immense advantage over current systems in which the controls are always 
30 separated from the samples in all steps. 



'iSOOCID: <WO 972733 1A2 IA> 



WO 97/27331 



80 



PCT/US97/01304 



The compositions and methods described herein also have the advantage 
that they are modular in nature and can be fitted on any type of separation process or 
method and in addition, can be fitted onto any type of detection system as 
improvements are made in either types of respective technologies. For example, the 
5 methodologies described herein can be coupled with "bundled" CE arrays or 
microfabricated devices that enable separation of DNA fragments. 

C. SEPARATION OF DNA FRAGMENTS 

A sample that requires analysis is often a mixture of many components 

10 in a complex matrix. For samples containing unknown compounds, the components 
must be separated from each other so that each individual component can be identified 
by other analytical methods. The separation properties of the components in a mixture 
are constant under constant conditions, and therefore once determined they can be used 
to identify and quantify each of the components. Such procedures are typical in 

1 5 chromatographic and electrophoretic analytical separations. 

1. High-Performance Liquid Chromatography (HPLC) 

High-Performance liquid chromatography (HPLC) is a chromatographic 
separations technique to separate compounds that are dissolved in solution. HPLC 

20 instruments consist of a reservoir of mobile phase, a pump, an injector, a separation 
column, and a detector. Compounds are separated by injecting an aliquot of the sample 
mixture onto the column. The different components in the mixture pass through the 
column at different rates due to differences in their partitioning behavior between the 
mobile liquid phase and the stationary phase. 

25 Recently, IP-RO-HPLC on non-porous PS/DVB particles with 

chemically bonded alkyl chains have been shown to be rapid alternatives to capillary 
electrophoresis in the analysis of both single and double-strand nucleic acids providing 
similair degrees of resolution (Huber et al, 1993, AnaLBiochem., 212, p351; Huber et 
al., 1993, Nuc. Acids Res., 21, pl061; Huber et al., 1993, Biotechniques, 16, p898). In 

30 contrast to ion-excahnge chromoatrography, which does not always retain double-strand 
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DNA as a fiinction of strand length (Since AT base pairs intereact with the positively 
charged stationary phase, more strongly than GC base-pains), IP-RP-HPLC enables a 
strictly size-dependent separation. 

A method has been developed using 100 mM triethylammonium acetate 
as ion-pairing reagent, phosphodiester oligonucleotides could be successfully separated 
on alkylated non-porous 2.3 uM poly(styrene-divinylbenzene) particles by means of 
high performance liquid chromatography (Oefher et al., 1994, Anal. Biochem., 223, 
p39). The technique described allowed the separation of PCR products differing only 4 
to 8 base pairs in length within a size range of 50 to 200 nucleotides. 



2. Electrophoresis 

Electrophoresis is a separations technique that is based on the mobility of 
ions (or DNA as is the case described herein) in an electric field. Negatively charged 
DNA charged migrate towards a positive electrode and positively-charged ions migrate 

1 5 toward a negative electrode. For safety reasons one electrode is usually at ground and 
the other is biased positively or negatively. Charged species have different migration 
rates depending on their total charge, size, and shape, and can therefore be separated. 
An electrode apparatus consists of a high-voltage power supply, electrodes, buffer, and 
a support for the buffer such as a polyacrylamide gel, or a capillary tube. Open capillary 

20 tubes are used for many types of samples and the other gel supports are usually used for 
biological samples such as protein mixtures or DNA fragments. 

3. Capillary Electrop hnregig frF) 

Capillary electrophoresis (CE) in its various manifestations (free 
25 solution, isotachophoresis, isoelectric focusing, polyacrylamide gel, micellar 
electrokinetic "chromatography") is developing as a method for rapid high resolution 
separations of very small sample volumes of complex mixtures. In combination with the 
inherent sensitivity and selectivity of MS, CE-MS is a potential powerful technique for 
bioanalysis. In the novel application disclosed herein, the interfacing of these two 
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methods will lead to superior DNA sequencing methods that eclipse the current rate 
methods of sequencing by several orders of magnitude. 

The correspondence between CE and electrospray ionization (ESI) flow 
rates and the fact that both are facilitated by (and primarily used for) ionic species in 
5 solution provide the basis for an extremely attractive combination. The combination of 
both capillary zone electrophoresis (CZE) and capillary isotachophoresis with 
quadrapole mass spectrometers based upon ESI have been described (Olivares et al., 
Anal Chem. J9:1230, 1987; Smith et al., Anal Chem. 60:436, 1988; Loo et al., Anal 
Chem. 779:404, 1989; Edmonds et al., J. Chroma, 474:21, 1989; Loo et al., 

10 J. Microcolumn Sep. 7:223, 1989; Lee et al., J. Chromatog 458:313, 1988; Smith et al., 
J. Chromatog. 480:2\\, 1989; Grese et al., J. Am. Chem. Soc. 777:2835, 1989). Small 
peptides are easily amenable to CZE analysis with good (femtomole) sensitivity. 

The most powerful separation method for DNA fragments is 
polyacrylamide gel electrophoresis (PAGE), generally in a slab gel format. However, 

15 the major limitation of the current technology is the relatively long time required to 
perform the gel electrophoresis of DNA fragments produced in the sequencing 
reactions. An increase magnitude (10-fold) can be achieved with the use of capillary 
electrophoresis which utilize ultrathin gels. In free solution to a first approximation all 
DNA migrate with the same mobility as the addition of a base results in the 

20 compensation of mass and charge. In polyacrylamide gels, DNA fragments sieve and 
migrate as a function of length and this approach has now been applied to CE. 
Remarkable plate number per meter has now been achieved with cross-linked 
polyacrylamide (10 +? plates per meter, Cohen et al., Proc. Nail Acad Sci. t USA 
55:9660, 1988). Such CE columns as described can be employed for DNA sequencing. 

25 The method of CE is in principle 25 times faster than slab gel electrophoresis in a 
standard sequencer. For example, about 300 bases can be read per hour. The separation 
speed is limited in slab gel electrophoresis by the magnitude of the electric field which 
can be applied to the gel without excessive heat production. Therefore, the greater speed 
of CE is achieved through the use of higher field strengths (300 V/cm in CE versus 10 



wsoocirv --wo 



9727331 A2 IA> 



WO 97/27331 



83 



PCT/US97/01304 



10 



15 



V/cm in slab gel electrophoresis). The capillary format reduces the amperage and thus 
power and the resultant heat generation. 

Smith and others (Smith et ah, Nuc. Acids. Res. 75:4417, 1990) have 
suggested employing multiple capillaries in parallel to increase throughput. Likewise, 
Mathies and Huang (Mathies and Huang, Nature 359:167. 1992) have introduced 
capillary electrophoresis in which separations are performed on a parallel array of 
capillaries and demonstrated high through-put sequencing (Huang et al., Anal. Chew. 
64:967, 1992, Huang et al., Anal. Chem. 64:2149, 1992). The major disadvantage of 
capillary electrophoresis is the limited amount of sample that can be loaded onto the 
capillary. By concentrating a large amount of sample at the beginning of the capillary, 
prior to separation, loadability is increased, and detection levels can be lowered several 
orders of magnitude. The most popular method of preconcentration in CE is sample 
stacking. Sample stacking has recently been reviewed (Chien and Burgi, Anal. Chem. 
64 AS9A, 1992). Sample stacking depends of the matrix difference, (pH, ionic strength) 
between the sample buffer and the capillary buffer, so that the electric field across the 
sample zone is more than in the capillary region. In sample stacking, a large volume of 
sample in a low concentration buffer is introduced for preconcentration at the head of 
the capillary column. The capillary is filled with a buffer of the same composition, but 
at higher concentration. When the sample ions reach the capillary buffer and the lower 
20 electric field, they stack into a concentrated zone. Sample stacking has increased 
detectabilities 1 -3 orders of magnitude. 

Another method of preconcentration is to apply isotachophoresis (ITP) 
Prior to the free zone CE separation of analytes. ITP is an electrophoretic technique 
which allows microliter volumes of sample to be loaded on to the capillary, in contrast 
25 to the low nL injection volumes typically associated with CE. The technique relies on 
inserting the sample between two buffers (leading and trailing electrolytes) of higher 
and lower mobility respectively, than the analyte. The technique is inherently a 
concentration technique, where the analytes concentrate into pure zones migrating with 
the same speed. The technique is currently less popular than the stacking methods 
30 described above because of the need for several choices of leading and trailing 
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electrolytes, and the ability to separate only cationic or anionic species during a 
separation process. 

The heart of the DNA sequencing process is the remarkably selective 
electrophoretic separation of DNA or oligonucleotide fragments. It is remarkable 
5 because each fragment is resolved and differs by only nucleotide. Separations of up to 
1000 fragments (1000 bp) have been obtained. A further advantage of sequencing with 
cleavable tags is as follows. There is no requirement to use a slab gel format when 
DNA fragments are separated by polyacrylamide gel electrophoresis when cleavable 
tags are employed. Since numerous samples are combined (4 to 2000) there is no need 

10 to run samples in parallel as is the case with current dye-primer or dye-terminator 
methods (i.e., ABI373 sequencer). Since there is no reason to run parallel lanes, there is 
no reason to use a slab gel. Therefore, one can employ a tube gel format for the 
electrophoretic separation method. Grossman (Grossman et ah, Genet. Anal. Tech Appl 
9:9, 1992) have shown that considerable advantage is gained when a tube gel format is 

15 used in place of a slab gel format. This is due to the greater ability to dissipate Joule 
heat in a tube format compared to a slab gel which results in faster run times (by 50%), 
and much higher resolution of high molecular weight DNA fragments (greater than 
1 000 nt). Long reads are critical in genomic sequencing. Therefore, the use of cleavable 
tags in sequencing has the additional advantage of allowing the user to employ the most 

20 efficient and sensitive DNA separation method which also possesses the highest 
resolution. 

4. Microfabricated Devices 

Capillary electrophoresis (CE) is a powerful method for DNA 
25 sequencing, forensic analysis, PCR product analysis and restriction fragment sizing. CE 
is far faster than traditional slab PAGE since with capillary gels a far higher potential 
field can be applied. However, CE has the drawback of allowing only one sample to be 
processed per gel. The method combines the faster separations times of CE with the 
ability to analyze multiple samples in parallel. The underlying concept behind the use 
30 of microfabricated devices is the ability to increase the information density in 
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electrophoresis by miniaturizing the lane dimension to about 100 micrometers. The 
electronics industry routinely uses microfabrication to make circuits with features of 
less than one micron in size. The current density of capillary arrays is limited the 
outside diameter of the capillary tube. Microfabrication of channels produces a higher 
density of arrays. Microfabrication also permits physical assemblies not possible with 
glass fibers and links the channels directly to other devices on a chip. Few devices have 
been constructed on microchips for separation technologies. A gas chromatograph 
(Terry et al., IEEE Trans. Electron Device, ED-26AU0, 1979) and a liquid 
chromatograph (Manz et al., Sens. Actuators Bl .249, 1990) have been fabricated on 
silicon chips, but these devices have not been widely used. Several groups have 
reported separating fluorescent dyes and amino acids on microfabricated devices (Manz 
et al., J. Chromatography 593253, 1992, Effenhauser et al., Anal. Chem. (55:2637, 
1993). Recently Woolley and Mamies (Woolley and Mathies, Proc. Natl. Acad. Sci. 
91:1 1348, 1994) have shown that photolithography and chemical etching can be used to 
make large numbers of separation channels on glass substrates. The channels are filled 
with hydroxyethyl cellulose (HEC) separation matrices. It was shown that DNA 
restriction fragments could be separated in as little as two minutes. 

D. CLEAVAOF, OF TAGS 

As described above, different linker designs will confer cleavability 
('•lability") under different specific physical or chemical conditions. Examples of 
conditions which serve to cleave various designs of linker include acid, base, oxidation, 
reduction, fluoride, thiol exchange, photolysis, and enzymatic conditions. 

Examples of cleavable linkers that satisfy the general criteria for linkers 
listed above will be well known to those in the art and include those found in the 
catalog available from Pierce (Rockford, IL). Examples include: 

• ethylene glycobis(succinimidylsuccinate) (EGS), an amine reactive 

cross-linking reagent which is cleavable by hydroxylamine (1 M at 37°C 

for 3-6 hours); 
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disuccinimidyl tartarate (DST) and sulfo-DST, which are amine reactive 
cross-linking reagents, cleavable by 0.015 M sodium periodate; 

• bis[2-(succinimidyloxycarbonyloxy)ethyl]sulfone (BSOCOES) and 
sulfo-BSOCOES, which are amine reactive cross-linking reagents, 

5 cleavable by base (pH 1 1.6); 

• 1.4-di-[3 , -(2'-pyridyldithio(propionamido))butane (DPDPB), a 
pyridyldithiol crosslinker which is cleavable by thiol exchange or 
reduction; 

• N -[ 4 Kp-azidosalicylamido)-butyl]-3H2'-pyridydimio)pro P ionamide 

10 (APDP), a pyridyldithiol crosslinker which is cleavable by thiol 

exchange or reduction; 

• bis-Ibeta^-CazidosalicylamidOethylJ-disulfide, a photoreactive 
crosslinker which is cleavable by thiol exchange or reduction; 

• N-succinimidyl-(4-a2idophenyl)-l,3'dithiopropionate (SADP), a 
15 photoreactive crosslinker which is cleavable by thiol exchange or 

reduction; 

• s «lfosuccinimidyl-2^^ 

dithiopropionate (SAED), a photoreactive crosslinker which is cleavable 
by thiol exchange or reduction; 

20 * su,fosuccin imidyl-2Km-azidc~(>-iutrobenzamido)-ethyl- 

1,3'dithiopropionate (SAND), a photoreactive crosslinker which is 
cleavable by thiol exchange or reduction. 

Other examples of cleavable linkers and the cleavage conditions that can 
be used to release tags are as follows. A silyl linking group can be cleaved by fluoride 

, 25 or under acidic conditions. A 3-. 4-, 5-, or 6-substituted-2-nitrobenzyloxy or 2-, 3-, 5-, 
or 6-substituted-4-nitrobenzyloxy linking group can be cleaved by a photon source 
(photolysis). A 3-, 4-, 5-, or 6-substituted-2-alkoxyphenoxy or 2-, 3-, 5-, or 6- 
substituted-4-alkoxyphenoxy linking group can be cleaved by Ce(NH 4 ),(NO,) 6 
(oxidation). A NC0 2 (urethane) linker can be cleaved by hydroxide (base), acid, or 

30 LiAlH, (reduction). A 3-pentenyl, 2-butenyl, or 1-butenyl linking group can be cleaved 
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by O,, <WI0 4 \ or KMnO, (oxidation). A 2-[3-, 4-, or 5-substituted-furyl]oxy linking 

group can be cleaved by 0 2 , Br 2 , MeOH, or acid. 

Conditions for the cleavage of other labile linking groups include- 

t-alkyloxy linking groups can be cleaved by acid; methyl(dialkyl)methoxy or 4- 
substituted-2-alkyM,3-dioxlane-2-yl linking groups can be cleaved by H,0> 
2-silylethoxy linking groups can be cleaved by fluoride or acid; 2-(X)-ethoxy (where 
X = keto, ester amide, cyano, N0 2 , sulfide, sulfoxide, sulfone) linking groups can be 
cleaved under alkaline conditions; 2-, 3-, 4-, 5-, or 6-substituted-benzyloxy linking 
groups can be cleaved by acid or under reductive conditions; 2-butenyloxy linking 
groups can be cleaved by (Pl^RhCKH), 3-, 4-, 5-, or 6- S ubstituted-2-bromophenoxy 
Unking groups can be cleaved by Li, Mg, or BuLi; methylthiomethoxy linking groups 
can be cleaved by Hg"; 2-(X)-ethyloxy (where X = a halogen) linking groups can W 
cleaved by Zn or Mg; 2-hydroxyethyloxy linking groups can be cleaved by oxidation 
(e.g., with Pb(OAc) 4 ). 

Preferred linkers are those that are cleaved by acid or photolysis. Several 
of the acid-labile linkers that have been developed for solid phase peptide synthesis are 
usefui for linking tags to MOIs. Some of these linkers are described in a recent review 
by Lloyd-Williams etal. (Tetrahedron 49:1 1065-1 1133, 1993). One useful type of 
linker is based upon p-alkoxybenzyl alcohols, of which two, 4- 
hydroxymethylphenoxyacetic acid and 4-(4-hydroxymethyl-3-methoxyphenoxy)butyric 
acd, are commercially available from Advanced ChemTech (Louisville, KY) Both 
linkers can be attached to a tag via an ester linkage to the benzylalcohol, and to an 
amme-containing MOI via an amide linkage to the carboxylic acid. Tags linked by 
these molecules are released from the MOI with varying concentrations of 
tnfluoroacetic acid. The cleavage of these linkers results in the liberation of a 
carboxylic acid on the tag. Acid cleavage of tags attached through related linkers, such ■: 
as 2,4-d I memoxy-4Hcarboxymemyloxy)-benzhydrylamine (available from Advanced 
ChemTech in FMOC-protected form), results in liberation of a carboxylic amide on the 
released tag. 
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The photolabile linkers useful for this application have also been for the 
most part developed for solid phase peptide synthesis (see Lloyd-Williams review). 
These linkers are usually based on 2-nitrobenzylesters or 2-nitrobenzylamides. Two 
examples of photolabile linkers that have recently been reported in the literature are 4- 

5 (4-(l-Fmoc-amino)ethyl)-2-methoxy-5-nitrophenoxy)butanoic acid (Holmes and Jones, 
J. Org. Chem. 60:2318-2319, 1995) and 3-(Fmoc-amino)-3-(2-nitrophenyl)propionic 
acid (Brown et al., Molecular Diversity i:4-12, 1995). Both linkers can be attached via 
the carboxylic acid to an amine on the MOI. The attachment of the tag to the linker is 
made by forming an amide between a carboxylic acid on the tag and the amine on the 

10 linker. Cleavage of photolabile linkers is usually performed with UV light of 350 nm 
wavelength at intensities and times known to those in the art. Examples of commercial 
sources of instruments for photochemical cleavage are Aura Industries Inc. (Staten 
Island, NY) and Agrenetics (Wilmington, MA). Cleavage of the linkers results in 
liberation of a primary amide on the tag. Examples of photocleavable linkers include 

15 nitrophenyl glycine esters, exo- and endo-2-benzonorborneyl chlorides and methane 
sulfonates, and 3-amino-3(2-nitrophenyl) propionic acid. Examples of enzymatic 
cleavage include esterases which will cleave ester bonds, nucleases which will cleave 
phosphodiester bonds, proteases which cleave peptide bonds, etc. 

20 E. DETECTION OF TAGS 

Detection methods typically rely on the absorption and emission in some 
type of spectral field. When atoms or molecules absorb light, the incoming energy 
excites a quantized structure to a higher energy level. The type of excitation depends on 
the wavelength of the light. Electrons are promoted to higher orbitals by ultraviolet or 

25 visible light, molecular vibrations are excited by infrared light, and rotations are excited 
by microwaves. An absorption spectrum is the absorption of light as a function of 
wavelength. The spectrum of an atom or molecule depends on its energy level 
structure. Absorption spectra are useful for identification of compounds. Specific 
absorption spectroscopic methods include atomic absorption spectroscopy (AA), 

30 infrared spectroscopy (IR), and UV-vis spectroscopy (uv-vis). 
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Atoms or molecules that are excited to high energy levels can decay to 
lower levels by emitting radiation. This light emission is called fluorescence if the 
transition is between states of the same spin, and phosphorescence if the transition 
occurs between states of different spin. The emission intensity of an analyte is linearly 
proportional to concentration (at low concentrations), and is useful for quantifying the 
emitting species. Specific emission spectroscopic methods include atomic emission 
spectroscopy (AES), atomic fluorescence spectroscopy (AFS), molecular laser-induced 
fluorescence (LIF), and X-ray fluorescence (XRF). 

When electromagnetic radiation passes through matter, most of the 
radiation continues in its original direction but a small fraction is scattered in other 
directions. Light that is scattered at the same wavelength as the incoming light is called 
Rayle lg h scattering. Light that is scattered in transparent solids due to vibrations 
(phonons) is called Brillouin scattering. Brillouin scattering is typically shifted by 0 1 
to 1 wave number from the incident light. Light that is scattered due to vibrations in 
molecules or optical phonons in opaque solids is called Raman scattering. Raman , 
scattered light is shifted by as much as 4000 wavenumbers from the incident light. 
Specific scattering spectroscopic methods include Raman spectroscopy. 

IR spectroscopy is the measurement of the wavelength and intensity of 
the absorption of mid-infrared light by a sample. Mid-infrared light (2.5 - 50 urn 4000 
- 200 cm ') is energetic enough to excite molecular vibrations to higher energy levels 
the wavelength of IR absorption bands are characteristic of specific types of chemical 
bonds and IR spectroscopy is generally most useful for identification of organic and 
organometallic molecules. 

Near-infrared absorption spectroscopy (NIR) is the measurement of the 
wavelength and intensity of the absorption of near-infrared light by a sample. Near- 
mfrared light spans the 800 nm - 2.5 um (12,500 - 4000 cm") range and is energetic 
enough to excite overtones and combinations of molecular vibrations to higher energy 
levels. NIR spectroscopy is typically used for quantitative measurement of organic 
functtonal groups, especially O-H, N-H, and OO. The components and design of NIR 
mstrumentation are similar to uv-vis absorption spectrometers. The light source is - 
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usually a tungsten lamp and the detector is usually a PbS solid-state detector. Sample 
holders can be glass or quartz and typical solvents are CC1 4 and CS 2 . The convenient 
instrumentation of NIR spectroscopy makes it suitable for on-line monitoring and 
process control. 

5 Ultraviolet and Visible Absorption Spectroscopy (uv-vis) spectroscopy is 

the measurement of the wavelength and intensity of absorption of near-ultraviolet and 
visible light by a sample. Absorption in the vacuum UV occurs at 100-200 nm; (10 5 - 
50,000 era* 1 ) quartz UV at 200-350 nm; (50,000-28,570 cm" 1 ) and visible at 350-800 
nm; (28,570-12,500 cm* 1 ) and is described by the Beer-Lambert-Bouguet law. 

10 Ultraviolet and visible light are energetic enough to promote outer electrons to higher 
energy levels. UV-vis spectroscopy can be usually applied to molecules and inorganic 
ions or complexes in solution. The uv-vis spectra are limited by the broad features of 
the spectra- The light source is usually a hydrogen or deuterium lamp for uv 
measurements and a tungsten lamp for visible measurements. The wavelengths of these 

15 continuous light sources are selected with a wavelength separator such as a prism or 
grating monochromator. Spectra are obtained by scanning the wavelength separator and 
quantitative measurements can be made from a spectrum or at a single wavelength. 

Mass spectrometers use the difference in the mass-to-charge ratio (m/z) 
of ionized atoms or molecules to separate them from each other. Mass spectrometry is 

20 therefore useful for quantitation of atoms or molecules and also for determining 
chemical and structural information about molecules. Molecules have distinctive 
fragmentation patterns that provide structural information to identify compounds. The 
general operations of a mass spectrometer are as follows. Gas-phase ions are created, 
the ions are separated in space or time based on their mass-to-charge ratio, and the 

25 quantity of ions of each mass-to-charge ratio is measured. The ion separation power of 
a mass spectrometer is described by the resolution, which is defined as R = m / delta m, 
where m is the ion mass and delta m is the difference in mass between two resolvable 
peaks in a mass spectrum. For example, a mass spectrometer with a resolution of 1000 
can resolve an ion with a m/z of 100.0 from an ion with a m/z of 100.1 . 
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In general, a mass spectrometer (MS) consists of an ion source, a mass- 
selective analyzer, and an ion detector. The magnetic-sector, quadrupole, and time-of- 
flight designs also require extraction and acceleration ion optics to transfer ions from 
the source region into the mass analyzer. The details of several mass analyzer designs 
(for magnetic-sector MS, quadrupole MS or time-of-flight MS) are discussed below. 
Single Focusing analyzers for magnetic-sector MS utilize a particle beam path of 180, 
90, or 60 degrees. The various forces influencing the particle separate ions with 
different mass-to-charge ratios. With double-focusing analyzers, an electrostatic 
analyzer is added in this type of instrument to separate particles with difference in 
kinetic energies. 

A quadrupole mass filter for quadrupole MS consists of four metal rods 
arranged in parallel. The applied voltages affect the trajectory of ions traveling down 
the flight path centered between the four rods. For given DC and AC voltages, only 
ions of a certain mass-to-charge ratio pass through the quadrupole filter and all other 
ions are thrown out of their original path. A mass spectrum is obtained by monitoring 
the ions passing through the quadrupole filter as the voltages on the rods are varied. 

A time-of-flight mass spectrometer uses the differences in transit time 
through a "drift region" to separate ions of different masses. It operates in a pulsed 
mode so ions must be produced in pulses and/or extracted in pulses. A pulsed electric 
field accelerates all ions into a field-free drift region with a kinetic energy of qV, where 
q is the ion charge and V is the applied voltage. Since the ion kinetic energy is 
0.5 mV\ lighter ions have a higher velocity than heavier ions and reach the detector at 
the end of the drift region sooner. The output of an ion detector is displayed on an 
oscilloscope as a function of time to produce the mass spectrum. 

The ion formation process is the starting point for mass spectrometric 
analyses. Chemical ionization is a method that employs a reagent ion to react with the 
analyte molecules (tags) to form ions by either a proton or hydride transfer. The reagent 
ions are produced by introducing a large excess of methane (relative to the tag) into an 
electron impact (EI) ion source. Electron collisions produce CH 4 * and CH/ which 
further react with methane to form CH 3 * and Cj H,\ Another method to ionize tags is by 
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plasma and glow discharge. Plasma is a hot, partially-ionized gas that effectively 
excites and ionizes atoms. A glow discharge is a low-pressure plasma maintained 
between two electrodes. Electron impact ionization employs an electron beam, usually 
generated from a tungsten filament, to ionize gas-phase atoms or molecules. An 
5 electron from the beam knocks an electron off analyte atoms or molecules to create 
ions. Electrospray ionization utilizes a very fine needle and a series of skimmers. A 
sample solution is sprayed into the source chamber to form droplets. The droplets carry 
charge when the exit the capillary and as the solvent vaporizes the droplets disappear 
leaving highly charged analyte molecules. ESI is particularly useful for large biological 
10 molecules that are difficult to vaporize or ionize. Fast-atom bombardment (FAB) 
utilizes a high-energy beam of neutral atoms, typically Xe or Ar, that strikes a solid 
sample causing desorption and ionization. It is used for large biological molecules that 
are difficult to get into the gas phase. FAB causes little fragmentation and usually gives 
a large molecular ion peak, making it useful for molecular weight determination. The 
15 atomic beam is produced by accelerating ions from an ion source though a charge- 
exchange cell. The ions pick up an electron in collisions with neutral atoms to form a 
beam of high energy atoms. Laser ionization (LIMS) is a method in which a laser pulse 
ablates material from the surface of a sample and creates a microplasma that ionizes 
some of the sample constituents. Matrix-assisted laser desprption ionization (MALDI) 
20 is a LIMS method of vaporizing and ionizing large biological molecules such as 
proteins or DNA fragments. The biological molecules are dispersed in a solid matrix 
such as nicotinic acid. A UV laser pulse ablates the matrix which carries some of the 
large molecules into the gas phase in an ionized form so they can be extracted into a 
mass spectrometer. Plasma-desorption ionization (PD) utilizes the decay of 252 Cf which 
25 produces two fission fragments that travel in opposite directions. One fragment strikes 
the sample knocking out 1-10 analyte ions. The other fragment strikes a detector and 
triggers the start of data acquisition. This ionization method is especially useful for 
large biological molecules. Resonance ionization (RIMS) is a method in which one or 
more laser beams are tuned in resonance to transitions of a gas-phase atom or molecule 
30 to promote it in a stepwise fashion above its ionization potential to create an ion. 
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Secondary ionization (SIMS) utilizes an ion beam; such as 3 He\"0\ or 40 Ar ; is 
focused onto the surface of a sample and sputters material into the gas phase. Spark 
source is a method which ionizes analytes in solid samples by pulsing an electric current 
across two electrodes. 

A tag may become charged prior to, during or after cleavage from the 
molecule to which it is attached. Ionization methods based on ion "desorption", the 
direct formation or emission of ions from solid or liquid surfaces have allowed 
increasing application to nonvolatile and thermally labile compounds. These methods 
eliminate the need for neutral molecule volatilization prior to ionization and generally 
minimize thermal degradation of the molecular species. These methods include field 
desorption (Becky, Principles of Field Ionization and Field Desorption Mass 
Spectrometry, Pergamon, Oxford, 1977), plasma desorption (Sundqvist and Macfarlane 
Mass Spectrom. Rev. 4:421, 1985), laser desorption (Karas and Hillenkamp, Anal 
Chem. 60:2299, 1988; Karas et al., Angew. Chem. 707:805, 1989), fast particle 
bombardment (e.g. , fast atom bombardment, FAB, and secondary ion mass 
spectrometry, SIMS, Barber et al., Anal. Chem. 54.645A, 1982), and thermospray (TS) 
ionization (Vestal, Mass Spectrom. Rev. 2:447, 1983). Thermospray is broadly applied 
for the on-line combination with liquid chromatography. The continuous flow FAB 
methods (Caprioli et al., Anal. Chem. 58:2949, 1986) have also shown significant 
potential. A more complete listing of ionization/mass spectrometry combinations is 
ion-trap mass spectrometry, electrospray ionization mass spectrometry, ion-spray mass 
spectrometry, liquid ionization mass spectrometry, atmospheric pressure ionization 
mass spectrometry, electron ionization mass spectrometry, metastable atom 
bombardment ionization mass spectrometry, fast atom bombard ionization mass 
spectrometry, MALDI mass spectrometry, , photo-ionization time-of-flight mass 
spectrometry, laser droplet mass spectrometry, MALDI-TOF mass spectrometry, APCI 
mass spectrometry, nano-spray mass spectrometry, nebulised spray ionization mass 
spectrometry, chemical ionization mass spectrometry, resonance ionization mass 
spectrometry, secondary ionization mass spectrometry, thermospray mass spectrometry. 
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The ionization methods amenable to nonvolatile biological compounds 
have overlapping ranges of applicability. Ionization efficiencies are highly dependent 
on matrix composition and compound type. Currently available results indicate that the 
upper molecular mass for TS is about 8000 daltons (Jones and Krolik, Rapid Comm. 
5 Mass Spectrom, 7:67, 1987). Since TS is practiced mainly with quadrapole mass 
spectrometers, sensitivity typically suffers disporportionately at higher mass-to-charge 
ratios (m/z). Time-of-flight (TOF) mass spectrometers are commercially available and 
possess the advantage that the m/z range is limited only by detector efficiency. 
Recently, two additional ionization methods have been introduced. These two methods 

1 0 are now referred to as matrix-assisted laser desorption (MALDI, Karas and Hillenkamp, 
Anal Chem. 60:2299, 1988; Karas et al., Angew. Chem. 707:805, 1989) and 
electrospray ionization (ESI). Both methodologies have very high ionization efficiency 
(i.e., very high [molecular ions produced]/[molecules consumed]). Sensitivity, which 
defines the ultimate potential of the technique, is dependent on sample size, quantity of 

1 5 ions, flow rate, detection efficiency and actual ionization efficiency. 

Electrospray-MS is based on an idea first proposed in the 1960s (Dole et 
al., J. Chem. Phys. 49:2240, 1968). Electrospray ionization (ESI) is one means to 
produce charged molecules for analysis by mass spectroscopy. Briefly, electrospray 
ionization produces highly charged droplets by nebulizing liquids in a strong 

20 electrostatic field. The highly charged droplets, generally formed in a dry bath gas at 
atmospheric pressure, shrink by evaporation of neutral solvent until the charge 
repulsion overcomes the cohesive forces, leading to a "Coulombic explosion" . The 
exact mechanism of ionization is controversial and several groups have put forth 
hypotheses (Blades et al., Anal Chem. 63:2109-14, 1991; Kebarle et al., Anal. Chem. 

25 <55:A972-86, 1993; Fenn, J. Am. Soc. Mass. Spectrom. 4:524-35, 1993). Regardless of 
the ultimate process of ion formation, ESI produces charged molecules from solution 
under mild conditions. 

The ability to obtain useful mass spectral data on small amounts of an 
organic molecule relies on the efficient production of ions. The efficiency of ionization 

30 for ESI is related to the extent of positive charge associated with the molecule. 
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Improving ionization experimental.y has usually involved using acidic conditions 
Aether method ,o improve ionization has been ,o use quaternary amines when possible 

zan," ** p "" e>n y:494 - 5 ° 3 - ,992; smi * - a »°> 

50:436-41, 1988). 

5 Electrospray ionization is described in m ore detail as follows. 

Electrospray ion production requires two s.eps: dispersal of highly charged droplets a. 
near atmospheric pressure, foUowed by conditions .o induce evaporation. A solution of 
analyte rao.ecujes is passed through . needl e ^ „ kep , „ hjgh ^ _ 
me end of me needle, the solution disperses in,o a nus, of sural, high,y charged droplets 
10 contamtng tire analyre moleoules. Tfie small droplets evaporate quickly and by a 
process of field desorption or tesidua, evaporation, protonated protein mole eules are 
released ,„,o tite gas phase. An electiospray is generally produced by application of a 
htgh electnc field to a small flow of liquid (generally ,.,0 uUtnin, ^ . capiUary - 
ti.be. A potential difference of 3^ k V is typically apphed beriveen the capiHary and 
counter eleetrode located 0.2-2 cm away (where ions, charged clusters, and even 
charged droplets, depending on the extern of deso.va.ion, ntay be sampled by tire MS 
•hrough a sma.1 orifice, The electric field results in charge acoumuiation on the liquid 
surface a, the capiHary terminus; titus tite Hquid flow rate, resistivity, and surface - 
«ens.o„ are nnportitn, faaors in drople. potion. The high e.ectiic field resuhs in * 
dtsrupfon of tite liquid surface and formation of high.y charged liquid droplers 
Posttively or „e gatjvelv chargcd ^ ^ ^ ^ ^ 

btas. The negative ion mode requires the presence of an election scavenger such as 
oxygen to inhibit electrical discharge. 

. , A wide range of liquids can be sprayed electrostatically into a vacuum 
orwnhtireaidofanebu.iztogagenr. 

leads to some practical restriction, on the range of liquid conductivity and dielectric 
constant. SoUrtion conductivity of ,ess man ,0> ohms is required a, room temperature 
for a sfcb.e electeospray a, useful liquid flow rates correspond «„ m ^ 

electrolyte solution of < 10" 4 M In th» c ^ 

ln the m ode found most useful for ESI-MS an 
0 appropriate liquid flow rate results in diversion „f t u„ ,• ^ 

*uiu> in dispersion of the liquid as a fine mist. A short 
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distance from the capillary the droplet diameter is often quite uniform and on the order 
of 1 jxm. Of particular importance is that the total electrospray ion current increases 
only slightly for higher liquid flow rates. There is evidence that heating is useful for 
manipulating the electrospray. For example, slight heating allows aqueous solutions to 
be readily electrosprayed, presumably due to the decreased viscosity and surface 
tension. Both thermally-assisted and gas-nebulization-assisted electrosprays allow 
higher liquid flow rates to be used, but decrease the extent of droplet charging. The 
formation of molecular ions requires conditions effecting evaporation of the initial 
droplet population. This can be accomplished at higher pressures by a flow of dry gas 
at moderate temperatures (<60°C), by heating during transport through the interface, 
and (particularly in the case of ion trapping methods) by energetic collisions at 
relatively low pressure. 

Although the detailed processes underlying ESI remain uncertain, the 
very small droplets produced by ESI appear to allow almost any species carrying a net 
charge in solution to be transferred to the gas phase after evaporation of residual 
solvent. Mass spectrometric detection then requires that ions have a tractable m/z range 
(<4000 daltons for quadrupole instruments) after desolvation, as well as to be produced 
and transmitted with sufficient efficiency. The wide range of solutes already found to 
be amenable to ESI-MS, and the lack of substantial dependence of ionization efficiency 
upon molecular weight, suggest a highly non-discriminating and broadly applicable 

7 

ionization process. 

The electrospray ion "source" functions at near atmospheric pressure. 
The electrospray "source" is typically a metal or glass capillary incorporating a method 
for electrically biasing the liquid solution relative to a counter electrode. Solutions, 
typically water-methanol mixtures containing the analyte and often other additives such 
as acetic acid, flow to the capillary terminus. An ESI source has been described (Smith 
et al., Anal. Chem. 62:885, 1990) which can accommodate essentially any solvent 
system. Typical flow rates for ESI are 1-10 uL/min. The principal requirement of an 
ESI-MS interface is to sample and transport ions from the high pressure region into the 
MS as efficiently as possible. 
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The efficiency of ESI can be very high, providing the basis for extremely 
sensitive measurements, which is useful for the invention described herein. Current 
instrumental performance can provide a total ion current at the detector of about 2 x 10" 
12 A or about 10 7 counts/s for singly charged species. On the basis of the instrumental 
5 performance, concentrations of as low as 10 '° M or about 10" mol/s of a singly 
charged species will give detectable ion current (about 10 counts/s) if the analyte is 
completely ionized. For example, low attomole detection limits have been obtained for 
quaternary ammonium ions using an ESI interface with capillary zone electrophoresis 
(Smith et ai., Anal. Chem. 59:1230, 1988). For a compound of molecular weight of 
10 1000, the average number of charges is 1, the approximate number of charge states is 1, 
peak width (m/z) is 1 and the maximum intensity (ion/s) is 1 x 10 12 . 

Remarkably little sample is actually consumed in obtaining an ESI mass 
spectrum (Smith et al., Anal. Chem. 60A94S, 1988). Substantial gains might be also 
obtained by the use of array detectors with sector instruments, allowing simultaneous 
15 detection of portions of the spectrum. Since currently only about 10" of all ions formed - 
by ESI are detected, attention to the factors limiting instrument performance may 
provide a basis for improved sensitivity. It will be evident to those in the ait that the 
present invention contemplates and accommodates for improvements in ionization and 
detection methodologies. 

20 An interface is preferably placed between the separation instrumentation 

(e.g., gel)and the detector (e.g., mass spectrometer). The interface preferably has the 
following properties: (l)the ability to collect the DNA fragments at discreet time 
intervals, (2) concentrate the DNA fragments, (3) remove the DNA fragments from the 
electrophoresis buffers and milieu, (4) cleave the tag from the DNA fragment 

25 (5) separate the tag from the DNA fragment, (6) dispose of the DNA fragment, (7) place 
the tag in a volatile solution, (8) volatilize and ionize the tag, and (9) place or transport 
the tag to an electrospray device that introduces the tag into mass spectrometer. 

The interface also has the capability of "collecting" DNA fragments as 
they elute from the bottom of a gel. The gel may be composed of a slab gel, a tubular 
30 gel, a capillary, etc. The DNA fragments can be collected by several methods. The first 
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method is that of use of an electric field wherein DNA fragments are collected onto or 
near an electrode. A second method is that wherein the DNA fragments are collected by 
flowing a stream of liquid past the bottom of a gel. Aspects of both methods can be 
combined wherein DNA collected into a flowing stream which can be later concentrated 
by use of an electric field. The end result is that DNA fragments are removed from the 
milieu under which the separation method was performed. That is, DNA fragments can 
be "dragged" from one solution type to another by use of an electric field. 

Once the DNA fragments are in the appropriate solution (compatible 
with electrospray and mass spectrometry) the tag can be cleaved from the DNA 
10 fragment. The DNA fragment (or remnants thereof) can then be separated from the tag 
by the application of an electric field (preferably, the tag is of opposite charge of that of 
the DNA tag). The tag is then introduced into the electrospray device by the use of an 
electric field or a flowing liquid. 

Fluorescent tags can be identified and quantitated most directly by their 
1 5 absorption and fluorescence emission wavelengths and intensities. 

While a conventional spectrofluorometer is extremely flexible, providing 
continuous ranges of excitation and emission wavelengths (Iex, 1 s „ 1 S2 ), more specialized 
instruments such as flow cytometers and laser-scanning microscopes require probes that 
are excitable at a single fixed wavelength. In contemporary instruments, this is usually 
20 the 488-nm line of the argon laser. 

Fluorescence intensity per probe molecule is proportional to the product 
of e and QY. The range of these parameters among fluorophores of current practical 
importance is approximately 10,000 to 100,000 cm'M" 1 for e and 0.1 to 1.0 for QY. 
When absorption is driven toward saturation by high-intensity illumination, the 
25 irreversible destruction of the excited fluorophore (photobleaching) becomes the factor 
limiting fluorescence detectability. The practical impact of photobleaching depends on 
the fluorescent detection technique in question. 

It will be evident to one in the art that a device (an interface) may be 
interposed between the separation and detection steps to permit the continuous 
30 operation of size separation and tag detection (in real time). This unites the separation 
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methodology and instrumentation with the detection methodology and instrumentation 
forming a single device. For example, an interface is interposed between a separation 
technique and detection by mass spectrometry or potentiostatic amperometry. 

The function of the interface is primarily the release of the (e g. mass 
i spectrometry) tag from analyte. There are several representative implementations of the 
interface. The design of the interface is dependent on the choice of cleavable linkers 
In the case of light or photo-cleavable linkers, an energy or.photon source is required In 
the case of an acid-labile linker, a base-labile linker, or a disulfide linker reagent 
addition is required within the interface. In the case of heat-labile linkers, an energy 
heat source is required. Enzyme addiuon is required for an enzyme-sensitive linker 
such as a specific protease and a peptide linker, a nuclease and a DNA or RNA linker a 
glycosylase, HRP or phosphatase and a linker which is unstable after cleavage (eg 
similiar to chemiluminescent substrates). Other characteristics of the interface include' 
minimal band broadening, separation of DNA from tags before injection into a mass 
spectrometer. Separation techniques include those based on electrophoretic methods ' 
and techniques, affinity techniques, size retention (dialysis), filtration and the like. 

It is also possible to concentrate the tags (or nucleic acid-linker-tag : < 
construct), capture electrophoretical.y, and then release into alternate reagent stream 
which is compatible with the particular type of ionization method selected The 
interface may also be capable of capturing the tags (or nucleic acid-linker-tag construct) 
on microbeads, shooting the bead(s) into chamber and then preforming laser 
desorption/vaporization. Also it is possible to extract in flow into alternate buffer (e g 
from capillary electrophoresis buffer into hydrophobic buffer across a permeable 
membrane). It may also be desirable in some uses to deliver tags into the mass 
spectrometer intermittently which would comprise a further function of the interface 
Another function of the interface is to deliver tags from multiple columns into a mass 
spectrometer, with a rotating time slot for each column. Also, it is possible to deliver 
tags from a single column into multiple MS detectors, separated by time, collect each 
set of tags for a few milliseconds, and then deliver to a mass spectrometer. 
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The following is a list of representative vendors for separation and 
detection technologies which may be used in the present invention. Hoefer Scientific 
Instruments (San Francisco, CA) manufactures electrophoresis equipment (Two Step™, 
Poker Face™ II) for sequencing applications. Pharmacia Biotech (Piscataway, NJ) 
5 manufactures electrophoresis equipment for DNA separations and sequencing 
(PhastSystem for PCR-SSCP analysis, MacroPhor System for DNA sequencing). 
Perkin Elmer/Applied Biosystems Division (ABI, Foster City, CA) manufactures semi- 
automated sequencers based on fluorescent-dyes (ABI373 and ABI377). Analytical 
Spectral Devices (Boulder, CO) manufactures UV spectrometers. Hitachi Instruments 

10 (Tokyo, Japan) manufactures Atomic Absorption spectrometers, Fluorescence 
spectrometers, LC and GC Mass Spectrometers, NMR spectrometers, and UV-VIS 
Spectrometers. PerSeptive Biosystems (Framingham, MA) produces Mass 
Spectrometers (Voyager™ Elite). Bruker Instruments Inc. (Manning Park, MA) 
manufactures FTIR Spectrometers (Vector 22), FT-Raman Spectrometers, Time of 

15 Flight Mass Spectrometers (Reflex II™), Ion Trap Mass Spectrometer (Esquire™) and 
a Maldi Mass Spectrometer. Analytical Technology Inc. (ATI, Boston, MA) makes 
Capillary Gel Electrophoresis units, UV detectors, and Diode Array Detectors. 
Teledyne Electronic Technologies (Mountain View, CA) manufactures an Ion Trap 
Mass Spectrometer (3DQ Discovery™ and the 3DQ Apogee™). Perkin Elmer/Applied 

20 Biosystems Division (Foster City, CA) manufactures a Sciex Mass Spectrometer (triple 
quadrupole LC/MS/MS, the API 100/300) which is compatible with electrospray. 
Hewlett-Packard (Santa Clara, CA) produces Mass Selective Detectors (HP 5972A), 
MALDI-TOF Mass Spectrometers (HP G2025A), Diode Array Detectors, CE units, 
HPLC units (HP 1090) as well as UV Spectrometers. Finnigan Corporation (San Jose, 

25 CA) manufactures mass spectrometers (magnetic sector (MAT 95 S™), quadrapole 
spectrometers (MAT 95 SQ™) and four other related mass spectrometers). Rainin 
(Emeryville, CA) manufactures HPLC instruments. 

The methods and compositions described herein permit the use of 
cleaved tags to serve as maps to particular sample type and nucleotide identity. At the 

30 beginning of each sequencing method, a particular (selected) primer is assigned a 
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particular unique tag. The tags map to either a sample type, a dideoxy terminator type 
(in the case of a Sanger sequencing reaction) or preferably both. Specifically, the tag 
maps to a primer type which in turn maps to a vector type which in turn maps to a 
sample identity. The tag may also may map to a dideoxy terminator type (ddTTP 
ddCTP, ddGTP, ddATP) by reference into which dideoxynucleotide reaction the tagged 
primer is placed. The sequencing reaction is then performed and the resulting fragments 
are sequentially separated by size in time. 

The tags are cleaved from the fragments in a temporal frame and 
measured and recorded in a temporal frame. The sequence is constructed by comparing 
the tag map to the temporal frame. That is, all tag identities are recorded in time after 
the sizing step and related become related to one another in a temporal frame The 
sizing step separates the nucleic acid fragments by a one nucleotide increment and 
hence the related tag identities are separated by a one nucleotide increment By 
foreknowledge of the dideoxy-terminator or nucleotide map and sample type, the 
15 sequence is readily deduced in a linear fashion. «. 

The following examples are offered by way of illustration, and not by 
way of limitation. 

Unless otherwise stated, chemicals as used in the examples may be 
20 obtained from Aldrich Chemical Company, Milwaukee, WI. The following 
abbreviations, with the indicated meanings, are used herein: 
ANP = 3-(Fmoc-aniino)-3-(2-nitrophenyl)propioriic acid 
NBA = 4-(Fmoc-aminomethyl)-3-nitrobenzoic acid . 

HATU = O-7-azabenzotriazol-l -yl-N,N,N',N'-tetramethyluronium hexafluoro- 
25 phosphate 

DIEA = diisopropylethylamine 

MCT = monochJorotriazine 

NMM = 4-methylmorpholine 

NMP « N-methylpyiroIidone 
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ACT357 - ACT357 peptide synthesizer from Advanced ChemTech, Inc., Louisville, 
KY 

ACT = Advanced ChemTech, Inc., Louisville, KY 

NovaBiochem = Caffiiochem-NovaBiochem International, San Diego, CA 
5 TFA = Trifluoroacetic acid 
Tfa = Trifluoroacetyl 
iNIP = N-Methylisonipecotic acid 
Tfp = Tetrafluorophenyl 
DIAEA = 2-(Diisopropylamino)ethylamine 
1 0 MCT = monochlorotriazene 

5*-AH-ODN = S'-aminohexyl-tailed oligodeoxynucleotide 
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EXAMPLES 



EXAMPLE 1 
Preparation of Acid Labile Linkers for Use in 
Cleavable-MW-Identifier Sequencing 

A - Synthesis of Pentafluorophenyl Esters of Chemi cally, Qgavable Mass 
SpeciroscppY Tags, to T .iherate T aPS with r a ,w vl Amidft |~ j 

Figure 1 shows the reaction scheme. 

StepA TentaGel S AC resin (compound II; available from ACT; 1 eq .) is suspended 
with DMF in the collection vessel of the ACT357 peptide synthesizer (ACT) 
Compound I (3 eq.), HATU (3 eq.) and DIEA (7.5 eq.) in DMF are added and the 
collection vessel shaken for 1 hr. The solvent is removed and the resin washed with 
NMP (2X), MeOH (2X), and DMF (2X). The coupling of I to the resin and the wash 
steps are repeated, to give compound III. 

Step_B. The resin (compound III) is mixed with 25% piperidine in DMF and shaken for 
5 min. The resin is filtered, then mixed with 25% piperidine in DMF and shaken for 10 
min. The solvent is removed, the resin washed with NMP (2X), MeOH (2X), and DMF 
(2X), and used directly in step C. 

Step_C. The deprotected resin from step B is suspended in DMF and to it is added an 
FMOC-protected amino acid, containing amine functionality in its side chain 
(compound IV, e.g. alpha-N-FMOC-3-(3- P yridyl)-alanine, available from Synthetech 
Albany, OR; 3 eq.), HATU (3 eq.), and DIEA (7.5 eq.) in DMF. The vessel is shaken 
for 1 nr. The solvent is removed and the resin washed with NMP (2X), MeOH (2X) 
and DMF (2X). The coupling of IV to the resin and the wash steps are repeated, to give 
compound V. 
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Step_D. The resin (compound V) is treated with piperidine as described in step B to 
remove the FMOC group. The deprotected resin is then divided equally by the ACT357 
from the collection vessel into 16 reaction vessels. 

5 Step_E. The 16 aliquots of deprotected resin from step D are suspended in DMF. To 
each reaction vessel is added the appropriate carboxylic acid VI,. 16 (R M6 C0 2 H; 3 eq.), 
HATU (3 eq.), and DIEA (7.5 eq.) in DMF. The vessels are shaken for 1 hr. The 
solvent is removed and the aliquots of resin washed with NMP (2X), MeOH (2X), and 
DMF (2X). The coupling of VI,. 16 to the aliquots of resin and the wash steps are 
1 0 repeated, to give compounds YII,. 16 . 

Step_F. The aliquots of resin (compounds VII M6 ) are washed with CH 2 C1 2 (3X). To 
each of the reaction vessels is added 1% TFA in CH 2 C1 2 and the vessels shaken for 30 
min. The solvent is filtered from the reaction vessels into individual tubes. The 
15 aliquots of resin are washed with CH 2 C1 2 (2X) and MeOH (2X) and the filtrates 
combined into the individual tubes. The individual tubes are evaporated in vacuo, 
providing compounds VHI M6 . 

Step_G. Each of the free carboxylic acids VIII M6 is dissolved in DMF. To each 
20 solution is added pyridine (1.05 eq.), followed by pentafluorophenyl trifluoroacetate 
(1.1 eq.). The mixtures are stirred for 45 min. at room temperature. The solutions are 
diluted with EtOAc, washed with 1 M aq. citric acid (3X) and 5% aq. NaHCO s (3X), 
dried over Na 2 SO<, filtered, and evaporated in vacuo, providing compounds IX,. 16 . 

25 B Synthesis of Pentafluorophenyl Fs t ers of Chemically Cleavable lvW 
Spectroscopy Tags, to Liberate Taps with Carboxyl Acid Termini 
Figure 2 shows the reaction scheme. 



Step_A. 4-(Hydroxymethyl)phenoxybutyric acid (compound I; 1 eq.) is combined with 
30 DIEA (2. 1 eq.) and allyl bromide (2. 1 eq.) in CHC1, and heated to reflux for 2 hr. The 
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mixture is diluted with EtOAc, washed with 1 N HCI (2X), pH 9.5 carbonate buffer 
(2X), and brine (IX), dried over Na^O,, and evaporated in vacuo to give the ally! ester 
of compound I. 

StepJJ. The allyl ester of compound I from step A (1.75 eq.) is combined in CH 2 C1 2 
w,th an FMOC-protected amino acid containing amine functionality in its side chain 
(compound II, e.g. alpha-N-FMOCOKS-pyridyD-alanine, available from Synthetech 
Albany, OR; 1 eq.), N-methylmorpholine (2.5 eq.), and HATU (1.1 eq.), and stirred at 
room temperature for 4 hr. The mixture is diluted with CH 2 C1 2 , washed with 1 M aq 
ctric acid (2X), water (IX), and 5% aq. NaHCO, (2X), dried over Na,S0 4 and 
evaporated in vacuo. Compound III is isolated by flash chromatography (CH,CI ~> 
EtOAc). " " 



StepC. Compound III is dissolved in CH 2 C1 2 , Pd(PPh 3 ) 4 (0.07 eq.) and N-methylaniline 
15 (2 eq.) are added, and the mixture stirred at room temperature for 4 hr. The mixture is 
diluted with CH 2 C1 2 , washed with 1 M aq. citric acid (2X) and water (IX); dried over 
Na 2 S0 4 , and evaporated in vacuo. Compound IV is isolated by flash chromatography 
(CH 2 Cl 2 -> EtOAc + HOAc). 

20 StepD. TentaGel S AC resin (compound V; 1 eq.) is suspended with DMF in the 
collection vessel of the ACT3 57 peptide synthesizer (Advanced ChemTech Inc (ACT) 
Louisville, KY). Compound IV (3 eq.), HATU (3 eq.) and DIEA (7.5 eq.) in DMF are' 
added and the collection vessel shaken for 1 hr. The solvent is removed and the resin 
washed with NMP (2X), MeOH (2X), and DMF (2X). The coupling of IV to the resin 

25 and the wash steps are repeated, to give compound VI. 

Step_E. The resin (compound VI) is mixed with 25«/o piperidine in DMF and shaken for 
5 mm. The resin is filtered, then mixed with 25% piperidine in DMF and shaken for 1 0 
mm. The solvent is removed and the resin washed with NMP (2X), MeOH (2X) and 
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DMF (2X). The deprotected resin is then divided equally by the ACT357 from the 
collection vessel into 16 reaction vessels. 

Step F . The 16 aliquots of deprotected resin from step E are suspended in DMF. To 
5 each reaction vessel is added the appropriate carboxylic acid VII M6 (R M6 C0 2 H; 3 eq.), 
HATU (3 eq.), and DIEA (7.5 eq.) in DMF. The vessels are shaken for 1 hr. The 
solvent is removed and the aliquots of resin washed with NMP (2X), MeOH (2X), and 
DMF (2X). The coupling of VII,., 6 to the aliquots of resin and the wash steps are 
repeated, to give compounds VHI,_ 16 . 

10 

Step G . The aliquots of resin (compounds VIII M6 ) are washed with CH 2 C1 2 (3X). To 
each of the reaction vessels is added 1% TFA in CH 2 C1 2 and the vessels shaken for 30 
min. The solvent is filtered from the reaction vessels into individual tubes. The 
aliquots of resin are washed with CH 2 C1 2 (2X) and MeOH (2X) and the filtrates 
15 combined into the individual tubes. The individual tubes are evaporated in vacuo 
providing compounds IX M6 . 

Step H . Each of the free carboxylic acids IX M6 is dissolved in DMF. To each solution 
is added pyridine (1.05 eq.), followed by pentafluorophenyl trifluoroacetate (1.1 eq.). 
20 The mixtures are stirred for 45 min. at room temperature. The solutions are diluted with 
EtOAc, washed with 1 M aq. citric acid (3X) and 5% aq. NaHC0 3 (3X), dried over 
NajSO^ filtered, and evaporated in vacuo, providing compounds X M6 . 

25 EXAMPLE 2 

Demonstration of Photolytic Cleavage 
ofT-L-X 



30 



A T-L-X compound as prepared in Example 1 3 was irradiated with near- 
UV light for 7 min at room temperature. A Rayonett fluorescence UV lamp (Southern 
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New England Ultraviolet Co., Middletown, CT) with an emission peak at 350 nm is 
used as a source of UV light. The lamp is placed at a 15-cm distance from the Petri 
dishes with samples. SDS gel electrophoresis shows that >85% of the conjugate is 
cleaved under these conditions. 

EXAMP1F 1 

Preparation of Fluorescent Labeled Pr,mers and 
Demonstration of Cleavage of Fluorophore 



10 



Synthesis and Purificat ion of Oliponucleotitfeg 

The oligonucleotides (ODNs) are prepared on automated DNA 
synthesizers using the standard phosphoramidite chemistry supplied by the vendor or 
the H-phosphonate chemistry (Glenn Research Sterling, VA). Appropriately blocked 
15 dA, dG, dC, and T phosphoramidites are commercially available in these forms and 
syntheuc nucleosides may readily be converted to the appropriate form The 
oligonucleotides are prepared using the standard phosphoramidite supplied by the 
vendor, or the H-phosphonate chemistry. Oligonucleotides are purified by adaptations 
of standard methods. Oligonucleotides with 5'-trityl groups are chromatographed on 
20 HPLC usmg a 12 micrometer, 300 # Rainin (Emeryville, CA) Dynamax C-8 4.2x250 
mm reverse phase column using a gradient of 15% to 55% MeCN in 0 1 N 
Et jN rrOAc-, p H 7.0, over 20 min. When detritylation is performed, the 
ohgonucleotides are further purified by gel exclusion chromatography. Analytical 
checks for the quality of the oligonucleotides are conducted with a PRP-column 
25 (Alltech, Deerfield, IL) at alkaline pH and by PAGE. 

Preparation of 2,4,6-trichlorotriazine derived oligonucleotides: 10 to 
1000 ug of 5'-terminal amine linked oligonucleotide are reacted with an excess 
recrystallized cyanuric chloride in 10% n-methyl-pyrrolidone in alkaline (pH 8 3 to 8 5 
preferably) buffer at 19'C to 25*C for 30 to 120 minutes. The final reaction conditions 
30 consist of 0.15 M sodium borate at pH 8.3, 2 mg/ml recrystallized cyanuric chloride and 
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500 ug/ml respective oligonucleotide. The unreacted cyanuric chloride is removed by 
size exclusion chromatography on a G-50 Sephadex (Pharmacia, Piscataway, NJ) 
column. 

The activated purified oligonucleotide is then reacted with a 100-fold 
5 molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with amine-reactive 
fluorochromes. The derived ODN preparation is divided into 3 portions and each 
portion is reacted with either (a) 20-fold molar excess of Texas Red sulfonyl chloride 

10 (Molecular Probes, Eugene, OR), with (b) 20-fold molar excess of Lissamine sulfonyl 
chloride (Molecular Probes, Eugene, OR), (c) 20-fold molar excess of fluorescein 
isothiocyanate. The final reaction conditions consist of 0.15 M sbdium borate at pH 8.3 
for 1 hour at room temperature. The unreacted fluorochromes are removed by size 
exclusion chromatography on a G-50 Sephadex column. 

15 To cleave the fluorochrome from the oligonucleotide, the ODNs are 

adjusted to 1 x 10' 5 molar and then dilutions are made (12, 3-fold dilutions) in TE (TE is 
0.01 M Tris, pH 7.0, 5 mM EDTA). To 100 volumes of ODNs 25 nl of 0.01 M 
dithiothreitoi (DTT) is added. To an identical set of controls no DDT is added. The 
mixture is incubated for 15 minutes at room temperature. Fluorescence is measured in a 

20 black microtiter plate. The solution is removed from the incubation tubes (150 
microliters) and placed in a black microtiter plate (Dynatek Laboratories, Chantilly, 
VA). The plates are then read directly using a Fluoroskan II fluorometer (Flow 
Laboratories, McLean, VA) using an excitation wavelength of 495 nm and monitoring 
emission at 520 nm for fluorescein, using an excitation wavelength of 591 nm and 

25 monitoring emission at 612 nm for Texas Red, and using an excitation wavelength of 
570 nm and monitoring emission at 590 nm for lissamine. 
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Moles of 
Fluorochrome 


RFU 
non-cleaved 


PPT 1 

cleaved 


RFU 
free 


1.0 x 10 5 M 


6.4 


1200 


1345 


3.3 x 10'M 


2.4 


451 


" JU 


1.1 x 10 6 M 


0.9 


135 


130 


3.7xl0 7 M 


0.3 


44 


48 


1.2 x 10 7 M 


0.12 


15.3 


16.0 


4.1 x 10 7 M 


0.14 


4.9 


5.1 


1.4 x 10 8 M 


0.13 


2.5 


2.8 


4.5 x lOlvl 


0.12 


0.8 


0.9 



The data indicate that there is about a 200-fold increase in relative fluorescence when 
the fluorochrome is cleaved from the ODN. 



EXAMPI,F. 4 
Preparation of Tagged M13 Sequence Primers 
and Demonstration of Cleavage of Tags 

Preparation of 2,4,6-trichlorotriazine derived oligonucleotides: 1000 Ug „ 
of 5'-terminal amine Jinked oligonucleotide (5'-hexylamine- 
TGTAAA ACG ACGGCC AGT-3 ") (Sea. ID No. 1) are reacted with an excess 
recrystalhzed cyanuric chloride in 10% n-methyl- pvrr olidone alkaline (pH 8 3 to 8 5 
preferably) buffer at 19 to 25- C for 30 to 120 minutes. The final reaction conditions 
consist of 0.15 M sodium borate at pH 8.3, 2 mg/ml recrystallized cyanuric chloride and 
500 ug/ml respective oligonucleotide. The unreacted cyanuric chloride is removed by 
size exclusion chromatography on a G-50 Sephadex column. 

The activated purified oligonucleotide is then reacted with a 100-fold 
molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with a variety of amides 
The derived ODN preparation is divided into 12 portions and each portion is reacted (25 
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molar excess) with the pentafluorophenyl-esters of either: (1) 4-methoxybenzoic acid, 
(2) 4-fluorobenzoic acid, (3) toluic acid, (4) benzoic acid, (5) indole-3-acetic acid. 
(6) 2,6-difluorobenzoic acid, (7) nicotinic acid N-oxide, (8) 2-nitrobenzoic acid, (9) 5- 
acetylsalicylic acid, (10) 4-ethoxybenzoic acid, (1 1) cinnamic acid, (12) 3- 
5 aminonicotinic acid. The reaction is for 2 hours at 37°C in 0.2 M NaBorate pH 8.3. 
The derived ODNs are purified by gel exclusion chromatography on G-50 Sephadex. 

To cleave the tag from the oligonucleotide, the ODNs are adjusted to 1 x 
10" 5 molar and then dilutions are made (12, 3-fold dilutions) in TE (TE is 0.01 M Tris, 
pH 7.0, 5 mM EDTA) with 50% EtOH (V/V). To 100 *il volumes of ODNs 25 pal of 
10 0.01 M dithiothreitol (DTT) is added. To an identical set of controls no DDT is added. 
Incubation is for 30 minutes at room temperature. NaCl is then added to 0.1 M and 2 
volumes of EtOH is added to precipitate the ODNs. The ODNs are removed from 
solution by centrifugation at 14,000 x G at 4°C for 15 minutes. The supernatants are 
reserved, dried to completeness. The pellet is then dissolved in 25 jil MeOH. The 
1 5 pellet is then tested by mass spectrometry for the presence of tags. 

The mass spectrometer used in this work is an external ion source 
Fourier-transform mass spectrometer (FTMS). Samples prepared for MALDI analysis 
are deposited on the tip of a direct probe and inserted into the ion source. When the 
sample is irradiated with a laser pulse, ions are extracted from the source and passed 
20 into a long quadrupole ion guide that focuses and transports them to an FTMS analyzer 
cell located inside the bore of a superconducting magnet. 

The spectra yield the following information. Peaks varying in intensity 
from 25 to 100 relative intensity units at the following molecular weights: (1)212.1 
amu indicating 4-methoxybenzoic acid derivative, (2) 200.1 indicating 4-fluorobenzoic 
25 acid derivative, (3) 196.1 amu indicating toluic acid derivative, (4) 1 82.1 amu indicating 
benzoic acid derivative, (5) 235.2 amu indicating indole-3 -acetic acid derivative, 
(6)218.1 amu indicating 2,6-difluorobenzoic derivative, (7)199.1 amu indicating 
nicotinic acid N-oxide derivative, (8)227.1 amu indicating 2-nitrobenzamide, 
(9) 179.18 amu indicating 5-acetylsalicylic acid derivative, (10) 226.1 amu indicating 4- 
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ethoxybenzoic acid derivative, (11)209.1 amu indicating cinnamic acid derivative, 
(12) 198.1 amu indicating 3-aminonicotinic acid derivative. 

The results indicate that the MW-identifiers are cleaved from the primers 
and are detectable by mass spectrometry. 



EXAMPLE S 

Demonstration of Sequencing Using an HPLC Separation Method 

COLLECTING FRACTIONS, CLEAVING THE MW IDENTIFIERS, DETERMINING THE 

Mass (and thus the Identity) of the MW-Identifier and then 
Deducing the Sequence 

The following oligonucleotides are prepared as described in Example 4: 



15 DMO 767: '5-hexylam: 
DMO 768: '5-hexylam 
DMO 769: '5-hexylam 
DMO 770: '5-hexylam 
DMO 771: '5-hexylam: 

20 DMO 772: '5-hexylam 
DMO 773: '5-hexylam 



ine-TGTAAAACGACGGCCAGT-3' (Seq. ID No. 1) 
ine-TGTAAAACGACGGCC AGTA-3 ' (Seq. ID No. 2) 
ine-TGTAAAACGACGGCCAGTAT-3' (Seq. ID No. 3) 
tne-TGTAAAACGACGGCCAGTATG-3' (Seq. ID No, 4) 
ine-TGTAAAACGACGGCCAGTATGC-3' (Seq. ID No. 5) 
ine-TGTAAAACGACGGCCAGTATGCA-3' (Seq. ID No. 6) 
me-TGTAAAACGACGGCCAGTATOCAT-3' (Seq. ID No. 7) 
DMO 774: '5-hexylamine-TGTAAAACGACGGCCAGTATGCATG-3' (Seq. ID No. 8) 

100 ug of each of the 5'-tenninal amine-linked oligonucleotides 
25 descnbed above are reacted with an excess recrystallized cyanuric chloride in 1 0% n- 
methyl-pyrrolidone alkaline (pH 8.3 to 8.5 preferably) buffer at 19«C to 25°C for 30 to 
120minutes. The final reaction conditions consist of 0.15 M sodium borate at pH 8 3 2 
mg/ml recrystallized cyanuric chloride and 500 ug/ml respective oligonucleotide The 
unreacted cyanuric chloride is removed by size exclusion chromatography on a G-50 
30 Sephadex column. 

The activated purified oligonucleotide is then reacted with a 100 fold 
molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for 1 hour at room 
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temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with a particular 
pentafluorophenyl-ester of the following: (1)DM0767 with 4-methoxybenzoic acid, 
(2) DM0768 with 4-fluorobenzoic acid, (3) toluic acid, (4) DM0769 with benzoic acid, 
5 (5)DMO770 with indole-3-acetic acid, (6)DM0771 with 2,6-difluorobenzoic acid, 
(7) DM0772 with nicotinic acid N-oxide, (8) DM0773 with 2-nitrobenzoic acid. 

10 ng of each of the eight derived ODNs are mixed together and then 
size separated by HPLC. The mixture is placed in 25 jil of distilled water. The entire 
sample is injected on to the following column. A LiChrospher 4000 DMAE, 50-10 mm 

1 0 column is used (EM Separations, Wakefield, RI). Eluent A is 20 mM Na 2 HP0 4 in 20% 
ACN, pH7.4; Eluent B is Eluent A+1M NaCl, pH7.4. The flowrate is for 1 ml/min 
and detection is UV @ 280 nm. The gradient is as follows: 0 min. @ 100% A and 0% 
B, 3 min. @ 100% A and 0% B, 15 min. @ 80% A and 20% B, 60 min. @ 0% A and 
100% B, 63 min. @ 0% A and 100% B, 65 min. @ 100% A and 0% B, 70 min. @ 

1 5 100% A and 0% B. Fractions are collected at 0.5 minute intervals. 

To cleave the tags from the oligonucleotide, 100 |il of 0.05 M 
dithiothreitol (DTT) is added to each fraction. Incubation is for 30 minutes at room 
temperature. NaCl is then added to 0.1 M and 2 volumes of EtOH is added to 
precipitate the ODNs. The ODNs are removed from solution by centrifugation at 

20 14,000 x G at 4°C for 15 minutes. The supernatents are reserved, dried to completeness 
under a vacuum with centrifiigation. The pellets are then dissolved in 25 |il MeOH. 
The pellet is then tested by mass spectrometry for the presence of MW-identifiers. The 
same MALDI technique is employed as described in Example 4. The following MWs 
(tags) are observed in the mass spectra as a function of time: 

25 



Fraction # 


Time 


MWs 


Fraction it 


Time 


MWs 


1 


0.5 


none 


31 


15.5 


212.1 


2 


1.0 


none 


32 


16 


212.1 


3 


1.5 


none 


33 


16.5 


212.1 


4 ■' • 


2.0 


none 


34 


17 


212.1;200. 
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c 
J 


2.5 


none . 


35 


17.5 


200.1 




6 


3.0 


none 


36 


18 


200.1 




7 


3.5 


none 


37 


18.5 


200.1 




D 
O 


4.0 


none 


38 


19 


200.1; 196.1 


5 


9 


4.5 


none 


39 


19.5 


200.1; 196.1 




1 A 
10 


5.0 


none 


40 


20 


196.1 




1 1 
1 1 


5.5 


none 


41 


20.5 


196.1 




12 


6.0 


none 


42 


21 


196.1; 182.1 




13 


6.5 


none 


43 


21.5 


182.1 


10 


14 


7.0 


none 


44 


22 


182.1 




15 


7.5 


none 


45 


22.5 


182.1; 235.2 




16 


8.0 


none 


46 


23 


235.2 




17 


8.5 


none 


47 


23.5 


235.2 




18 


9.0 


none 


48 


24 


235.2; 218.1 


15 


19 


9.5 


none 


49 


24.5 


218.1 




20 


10.0 


none ~ 


50 


25 


218.1 




21 


10.5 


none 


51 


25.5 


218.1; 199.1 




22 


11 


none 


52 


26 


199.1 




23 


11.5 


none 


53 


26.5 


199.1; 227.1 


20 


24 


12 


none 


54 


27 


227.1 




25 


12.5 


none 


55 


27.5 


227.1 




26 


13 


none 


56 


28 


none 




27 


13.5 


none 


57 


28.5 


none 




28 


14 


none 


58 


29 


none 


25 


29 


14.5 


none 


59 


29.5 


none 




30 


15 


none 


60 


30 


none 



The temporal appearance of the tags is thus 2 1 2. 1. 200. 1 , 1 96 1 1 82 1 
235.2, 218.1, 199.1, 227.1. Since 212.1 amu indicates the 4-methoxybenzoic acid 
30 derivative, 200. 1 indicates the 4-fluorobenzoic acid derivative, 1 96. 1 amu indicates the 
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toluic acid derivative, 182.1 amu indicates the benzoic acid derivative, 235.2 amu 
indicates the indole-3 -acetic acid derivative, 218.1 amu indicates the 2,6- 
difluorobenzoic derivative, 199.1 amu indicates the nicotinic acid N-oxide derivative, 
227.1 amu indicates the 2-nitrobenzamide, the sequence can be deduced as -5'- 
5 ATGCATG-3'-. 



EXAMPLE 6 

Demonstration of Sequencing of Two DNA Samples in a 
10 Single HPLC Separation Method 

In this example, two DNA samples are sequenced in a single separation 

method. 

The following oligonucleotides are prepared as described in Example 1 : 

15 

DMO 767: '5-hexylamine-TGTAAAACGACGGCCAGT-3' (Seq. ID No. 1) 
DMO 768: 'S-hexylamine-TGTAAAACGACGGCCAGTA-3' (Seq. ID No. 2) 
DMO 769: '5-hexylamine-TGTAAAACGACGGCCAGTAT-3' (Seq. ID No. 3) 
DMO 770: 'S-hexylamme-TGTAAAACGACGGCCAGTATG-S' (Seq. ID No. 4) 

20 DMO 771 : '5-hexylamine-TGTAAAACGACGGCCAGTATGC-3' (Seq. ID No. 5) 
DMO 772: '5-hexylamine-TGTAAAAGGACGGCCAGTATGCA-3' (Seq ID No. 6) 
DMO 775 : '5-hexylamine-TGTAAAACGACGGCCAGC-3* (Seq. ID No. 9) 
DMO 776: '5-hexylamine-TGTAAAACGACGGCCAGCG-3' (Seq. ID No. 10) 
DMO 777: '5-hexylamine-TGTAAAACGACGGCCAGCGT-3' (Seq. ID No. 1 1) 

25 DMO 778 : '5-hexylamine-TGTAAAACG ACGGCCAGCGTA-3' (Seq. ID No. 1 2) 
DMO 779: '5-hexylamine-TGTAAAACGACGGCCAGCGTAC-3' (Seq. ID No. 13) 
DMO 780: '5-hexylamine-TGTAAAACGACGGCCAGCGTACC-3' (Seq. ID No. 14) 

100 pg of each of the 5'-terminal amine-linked oligonucleotides 
30 described above are reacted with an excess recrystallized cyanuric chloride in 1 0% n- 
methyl-pyrrolidone alkaline (pH 8.3 to 8.5 preferably) buffer at 19°C to 25°C for 30 to 
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120 minutes. The final reaction conditions consist of 0.15 M sodium borate at pH 8.3, 2 
mg/ml recrystallized cyanuric chloride and 500 ug/ml respective oligonucleotide. The 
unreacted cyanuric chloride is removed by size exclusion chromatography on a G-50 
Sephadex column. 

5 The activated purified oligonucleotide is then reacted with a 100-fold 

molar excess of cystamine in 0.15 M sodium borate at pH 8.3 for I hour at room 
temperature. The unreacted cystamine is removed by size exclusion chromatography on 
a G-50 Sephadex column. The derived ODNs are then reacted with a particular 
pentafluorophenyl-ester of the following: (1) DM0767 with 4-methoxybenzoic acid and 

10 DM0773 with nicotinic acid N-oxide, (2) DM0768 with 4-fluorobenzoic acid and 
DM0774 with 2-nitrobenzoic acid, (3) toluic acid and DM0775 with acetylsalicylic 
acid, (4)DM0769 with benzoic acid and DM0776 with 4-ethoxybenzoic acid, 
(5) DMO770 with indole-3-acetic acid and DMO 777 with cinnamic acid, (6) DM0771 
with 2,6-difluorobenzoic acid and DM0778 with 3-aminonicotinic acid. Therefore, 

1 5 there is one of tags for each set of ODNs. 

10 ng of each of the 12 derived ODNs are mixed together and then size 
separated by HPLC. The mixture is placed in 25 pi of distilled water. The entire , 
sample is injected on to the following column. A LiChrospher.4000 DMAE, 50-10 mm 
column is used (EM Separations, Wakefield, RI). Eluent A is 20 mM Na^HPO, in 20% 

20 ACN, pH7.4; Eluent B is Eluent A + 1M NaCl, pH7.4. The flowrate is" for 1 ml/min 
and detection is UV @ 280 nm. The gradient is as follows: 0 min. @. 100% A and 0% 
B, 3 min. @ 100% A and 0% B, 15 min. @ 80% A and 20% B, 60 min. @ 0% A and 
100% B, 63 min. @ 0% A and 100% B, 65 min. @ 100% A and 0% B, 70 min. @ 
1 00% A and 0% B. Fractions are collected at 0.5 minute intervals. 

25 To cleave toe tags from the oligonucleotide, 100 pi of 0.05 M 

dithiothreitol (DTT) is added to each fraction. Incubation is.for 30 minutes at room 
temperature. NaCl is then added to 0.1 M and 2 volumes of EtOH is added to 
precipitate the ODNs. The ODNs are removed from solution by centrifugation at 
14,000 x G at 4«C for 15 minutes. The supernatents are reserved, dried to completeness 

30 under a vacuum with centrifugation. The pellets are then dissolved in 25 pi MeOH. 
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The pellet is then tested by mass spectrometry for the presence of tags. The same 
MALDI technique is employed as described in Example 4. The following MWs (tags) 
are observed in the mass spectra as a function of time: 



5 


Fraction # 


Time 


MWs 


Fraction # 


Time 


MWs 




1 


0.5 


none 


31 


15.5 


212.1, 199.1 




2 


1.0 


none 


32 


16 


212.1, 199.1 




3 


1.5 


none 


33 


16.5 


212.1, 199.1 




4 


2.0 


none 


34 


17 


212.1; 200.1, 199.1, 


10 












227.1 




5 


2.5 


none 


35 


17.5 


200.1,199.1,227.1 




6 


3.0 


none 


36 


18 


200.1,227.1 




7 


3.5 


none 


37 


18.5 


200.1,227.1, 179.18 




8 


4.0 


none 


38 


19 


200.1; 196.1, 179.18 


15 


9 


4.5 


none 


39 


19.5 


200.1; 196.1, 179.18 




10 


5.0 


none 


40 


20 


196.1, 179.18, 226.1 




11 


5.5 


none 


41 


20.5 


196.1,226.1 




12 


6.0 


none 


42 


21 


196.1; 182.1,226.1 




13 


6.5 


none 


43 


21.5 


182.1,226.1,209.1 


20 


14 


7.0 


none 


44 


22 


182.1,209.1 




15 


7.5 


none 


45 


22.5 


182.1; 235.2, 209.1, 
198.1 




16 


8.0 


none 


46 


23 


235.2, 198.1 




17 


8.5 


none 


47 


23.5 


235.2, 198.1 


25 


18 


9.0 


none 


48 


24 


235.2;, 198.1,218.1 




19 


9.5 


none 


49 


24.5 


218.1 




20 


10.0 


none 


50 


25 


218.1 




21 


10.5 


none 


51 


25.5 


none 




22 


11 


none 


52 


26 


none 


30 


23 


11.5 


none 


53 


26.5 


none . 
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24 


12 


none 


54 


27 


none 


25 


12.5 


none 


55 


27.5 


none 


26 


13 


none 


56 


28 


none 


27 


13.5 


none 


57 


28.5 


none 


5 28 


14 


none 


58 


29 


none 


29 


14.5 


none 


59 


29.5 


none 


30 


15 


none 


60 


30 


none 



The temporal appearance of the tags for set #1 is 212.1, 200.1. 196.1, 
0 182.1, 235.2, 218.1, 199.1, 227.1, and the temporal appearance of tags for set *> is 
199.1, 227.1, 179.1, 226.1, 209.1, 198.1. Since 212.1 amu indicates the 4- 
methoxybenzoic acid derivative, 200.1 indicates the 4-fluorobenzoic acid derivative. 
196.1 amu indicates the toluic acid derivative, 182.1 amu indicates the benzoic acid 
derivative, 235.2 amu indicates the indole-3-acetic acid derivative, 218.1 amu indicates 
5 the 2,6-difluorobenzoic derivative, 199.1 amu indicates the nicotinic acid N-oxide 
derivative, 227.1 amu indicates the 2-nitrobenzamide, 179.18 amu indicates the 5- 
acetylsalicylic acid derivative, 226.1 amu indicates the 4-ethoxybenzoic acid derivative, 
209.1 amu indicates the cinnamic acid derivative, and 198.1 amu indicates the 3- 
aminonicotinic acid, the first sequence can be deduced as -5'-TATGCA-3'- and the 
second sequence can be deduced as -5'-CGTACC-3'-. Thus, it is possible to sequence 
more than one DNA sample per separation step. 



EXAMPLE 7 

Preparation of a Set of Compounds 
of the Formula IWLYS(e-iNIP)-ANP-TFP 

Figure 3 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester), L : is an 
ortho-nitrobenzylamine group with V being a methylene group that links L h and L\ T 
has a modular structure wherein the carboxylic acid group of lysine has been joined to 
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the nitrogen atom of the L 2 benzylamine group to form an amide bond, and a variable 
weight component R N36 , (where these R groups correspond to T 2 as defined herein, and 
may be introduced via any of the specific carboxylic acids listed herein) is bonded 
through the a-amino group of the lysine, while a mass spec sensitivity enhancer group 
5 (introduced via N-methylisonipecotic acid) is bonded through the e-amino group of the 
lysine. 

Referring to Figure 3: 

StejiA. NovaSyn HMP Resin (available from NovaBiochem; 1 eq.) is suspended with 
DMF in the collection vessel of the ACT357. Compound I (ANP available from ACT; 
10 3 eq.), HATU (3 eq.) and NMM (7.5 eq.) in DMF are added and the collection vessel 
shaken for 1 hr. The solvent is removed and the resin washed with NMP (2X), MeOH 
(2X), and DMF (2X). The coupling of I to the resin and the wash steps are repeated, to 
give compound II. 

15 Step B. The resin (compound II) is mixed with 25% piperidine in DMF and shaken for 
5 min. The resin is filtered, then mixed with 25% piperidine in DMF and shaken for 10 
min. The solvent is removed, the resin washed with NMP (2X), MeOH (2X), and DMF 
(2X), and used directly in step C. 

20 Step C . The deprotected resin from step B is suspended in DMF and to it is added an 
FMOC-protected amino acid, containing a protected amine functionality in its side 
chain (Fmoc-Lysine(Aloc)-OH, available from PerSeptive Biosystems; 3 eq.), HATU (3 
eq.), and NMM (7.5 eq.) in DMF. The vessel is shaken for 1 hr. The solvent is 
removed and the resin washed with NMP (2X), MeOH (2X), and DMF (2X). The 

25 coupling of Fmoc-Lys(Aloc)-OH to the resin and the wash steps are repeated, to give 
compound IV. 



30 



Step D. The resin (compound IV) is washed with CH 2 C1 2 (2X), and then suspended in a 
solution of (PPh 3 ) 4 Pd (0) (0.3 eq.) and PhSiH 3 (10 eq.) in CH 2 C1 2 . The mixture is 
shaken for 1 hr. The solvent is removed and the resin is washed with CH 2 C1 2 (2X). 



WO 97/27331 



119 



PCT/US97/01304 



The palladium step is repeated. The solvent is removed and the resin is washed with 
CH 2 CI 2 (2X), N.N.diisopropylethylammonium diethyldithiocarbamate in DMF (2X), 
DMF (2X) to give compound V. 

5 StepR The deprotected resin from step D is coupled with N-methylisonipecotic acid as 
described in step C to give compound VI. 

Step_F. The Fmoc protected resin VI is divided equally by the ACT357 from the 
collection vessel into 36 reaction vessels to give compounds VI 106 . 



10 



StepG. The resin (compounds VI,. 36 ) is treated with piperidine as described in step B to 
remove the FMOC group. 

SlSEH. The 36 aliquols of deprotected resin from step G are suspended in DMF To 
15 eaoh reaetion vessel is added the appropriate earboxylic acid <R,. M CO,H ; 3 eq.). HATU 
(3 eq.), and NMM (7.5 eq.) in DMF. The vessels a re shaken for . hr. The solvent is 
removed and the aliqnots of resin washed with NMP (2X). MeOH (IX), and DMF, (2X) - 
The coupling of R.^H to the aliquots of resin and the wash steps m repeated, to 
give compounds VIII I06 . 

20 

Stepj. The aliquots of resin (compounds VIII,.,) are washed with CH 2 CI, (3X) To 
each of the reaction vessels is added 90:5:5 TFA:H20:CH 2 C1 2 and the vessels shaken 
for 120 mm. The solvent is filtered from the reaction vessels into individual tubes The 
aliquots of resin are washed with CH 2 C1 2 (2X) and MeOH (2X) and the filtrates 
25 combined into the individual tubes. The individual tubes are evaporated *, vacuo, 
providing compounds IX,. 36 . 

Stepj- Each of the free earboxylic acids IX,.* is dissolved in DMF. To each solution is 
added pyridine (1.05 eq.), followed by tetrafluorophenyl trifluoroacetate (1.1 eq ) The 
30 mixtures are stirred for 45 min. at room temperature. The solutions are diluted with 
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EtOAc, washed with 5% aq. NaHCO, (3X), dried over Na 2 S0 4 , filtered, and evaporated 
in vacuo, providing compounds X,.^. 

EXAMPL£_8 
Preparation of a Set of Compounds 
of the Formula R,. J6 -Lys(e-iNIP)-NBA-Tfp 

Figure 4 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
ortho-nitrobenzylamine group with L 3 being a direct bond between L h and L\ where L h 
is joined directly to the aromatic ring of the L 2 group, T has a modular structure wherein 
the carboxylic acid group of lysine has been joined to the nitrogen atom of the L 2 
benzylamine group to form an amide bond, and a variable weight component R,. 36 , 
(where these R groups correspond to T 2 as defined herein, and may be introduced via 
any of the specific carboxylic acids listed herein) is bonded through the a-amino group 
of the lysine, while a mass spec enhancer group (introduced via N-methylisonipecotic 
acid) is bonded through the e-amino group of the lysine. 

Referring to Figure 4 

Ste P A - NovaSyn HMP Resin is coupled with compound I (NBA prepared according 
to the procedure of Brown et al., Molecular Diversity, 1, 4 (1995)) according to the 
procedure described in step A of Example 7, to give compound II. 

Steps B-J. The resin (compound II) is treated as described in steps B-J of Example 7 to 
give compounds X,. 36 . 

EXAMPLE 9 
Preparation of a Set of Compounds 
of the Formula iNIP-Lys (e-R,. 36 )-ANP-TFP 

Figure 5 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h)> where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
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ortho-nitrobenzylamine group with L 3 being a methylene group that links L h and L 2 T 
has a modular structure wherein the carboxylic acid group of lysine has been joined to 
the mtrogen atom of the L 2 benzylamine group to form an amide bond, and a variable 
we.ght component R,. J6 , (where these R groups correspond to T 2 as defined herein and 
may be mtroduced via any of the specific carboxylic acids listed herein) is bonded 
through the £ -amino group of the lysine, while a mass spec sensitivity enhancer group 
(mtroduced via N-methylisonipecotic acid) is bonded through the a-amino group of the 
lysine. 

Referring to Figure 5: 
Steps A-C. Same as in Example 7. 

Stenj). The resin (compound IV) is treated with piperidine as described in step B of 
Example 7 to remove the FMOC group. 

< 

15 Stepj. The deprotected a-amine on the resin in step D is coupled with N- 
methyhsonipecotic acid as described in step C of Example 7 to give compound V. 

SteEJF. Same as in Example 7. 

20 Sjep_G. The resin (compounds VI,. 36 ) are treated with palladium as described in step D 
of Example 7 to remove the Aloe group. 

SteEsiW. The compounds X,. 36 are prepared in the same manner as in Example 7. 



25 



EXAMPTF 1f> 
Preparation of a Set of Compounds 
of the Formula R 1 . 3 «-GLu(y-DIAEA)-ANP-TFP 

Figured illustrates the parallel synthesis of a set of 36 T-L-X compounds 
30 (X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester) L 2 is an 
ortho-nitrobenzylamine group with V being a methylene group that links L, and L 2 T 
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has a modular structure wherein the a-carboxylic acid group of glutamatic acid has 
been joined to the nitrogen atom of the L 2 benzylamine group to form an amide bond, 
and a variable weight component R,_ 36 , (where these R groups correspond to T 2 as 
defined herein, and may be introduced via any of the specific carboxylic acids listed 
5 herein) is bonded through the aa-amino group of the glutamic acid, while a mass spec 
sensitivity enhancer group (introduced via 2-(diisopropylamino)ethylamine) is bonded 
through the y-carboxylic acid of the glutamic acid. 

Referring to Figure 6: 
Steps A-B . Same as in Example 7. 

10 

Step C . The deprotected resin (compound III) is coupled to Fmoc-Glu-(OAl)-OH using 
the coupling method described in step C of Example 7 to give compound IV. 

Step D . The allyl ester on the resin (compound IV) is washed with CH 2 C1 2 (2X) and 
15 mixed with a solution of (PPh 3 ) 4 Pd (0) (0.3 eq.) and N-methylaniline (3 eq.) in CH 2 C1 2 . 
The mixture is shaken for 1 hr. The solvent is removed and the resin is washed with 
CH 2 C1 2 (2X). The palladium step is repeated. The solvent is removed and the resin is 
washed with CH 2 C1 2 (2X), N,N-diisopropylethylammonium diethyldithiocarbamate in 
DMF (2X), DMF (2X) to give compound V. 

20 

Step E . The deprotected resin from step D is suspended in DMF and activated by 
mixing HATU (3 eq.), and NMM (7.5 eq.). The vessels are shaken for 15 minutes. The 
solvent is removed and the resin washed with NMP (IX). The resin is mixed with 2- 
(diisopropylamino)ethylamine (3 eq.) and NMM (7.5 eq.). The vessels are shaken for 1 
25 hour. The coupling of 2-(diisopropylamino)ethylamine to the resin and the wash steps 
are repeated, to give compound VI. 

Steps F-J . Same as in Example 7. 
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EXAMPLE 1 1 
Preparation of a Set of Compounds 
of the Formula R,. 36 -LYS(e-iNIP)-ANP-LYS(e-NH 2 )-NH 2 

Figure 7 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an amine (specifically, the e-amino group of a lysine-derived 
moiety), V is an ortho-nitrobenzylamine group with L 3 being a carboxamido- 
substituted alkyleneaminoacylalkylene group that links L h and L 2 , T has a modular 
structure wherein the carboxylic acid group of lysine has been joined to the nitrogen 
atom of the L 2 benzylamine group to form an amide bond, and a variable weight 
component R,.*, (where these R groups correspond to T 2 as defined herein, and may be 
introduced via any of the specific carboxylic acids listed herein) is bonded through the 
a-amino group of the lysine, while a mass spec sensitivity enhancer group (introduced 
via N-methylisonipecotic acid) is bonded through the £-amino group of the lysine. 
1 5 Referring to Figure 7: 

Step_A. Fmoc-Lys(Boc)-SRAM Resin (available from ACT; compound I) is mixed 
with 25o/„ piperidine in DMF and shaken for 5 min. The resin is filtered, then mixed 
with 25o/„ piperidine in DMF and shaken for 10 min. The solvent is removed, the resin 
washed with NMP (2X), MeOH (2X), and DMF (2X), and used directly in step B 



20 



25 



StepB. The resin (compound II), ANP (available from ACT; 3 eq.), HATU (3 eq.) and 
NMM (7.5 eq.) in DMF are added and the collection vessel shaken for 1 hr. The 
solvent is removed and the resin washed with NMP (2X), MeOH (2X), and DMF (2X). 
The coupling of I to the resin and the wash steps are repeated, to give compound III. 

StepsCJ. The resin (compound III) is treated as in steps B-I in Example 7 to give 
compounds X,. 36 . 
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EXAMPLE 12 
Preparation of a Set of Compounds 
of the Formula R 1 . 36 -LYS(8-TFA)-LYS(e-iINP)-ANP-TFP 

Figure 8 illustrates the parallel synthesis of a set of 36 T-L-X compounds 
(X = L h ), where L h is an activated ester (specifically, tetrafluorophenyl ester), L 2 is an 
ortho-nitrobenzylamine group with L 3 being a methylene group that links L h and L 2 , T 
has a modular structure wherein the carboxylic acid group of a first lysine has been 
joined to the nitrogen atom of the L 2 benzylamine group to form an amide bond, a mass 
spec sensitivity enhancer group (introduced via N-methylisonipecotic acid) is bonded 
through the e-amino group of the first lysine, a second lysine molecle has been joined 
to the first lysine through the a-amino group of the first lysine, a molecular weight 
adjuster group (having a trifluoroacetyl structure) is bonded through the e-amino group 
of the second lysine, and a variable weight component R,_ 36 , (where these R groups 
correspond to T 2 as defined herein, and may be introduced via any of the specific 
carboxylic acids listed herein) is bonded through the a-amino group of the second 
lysine. Referring to Figure 8: 

Steps A-E . These steps are identical to steps A-E in Example 7. 

Step F . The resin (compound VI) is treated with piperidine as described in step B in 
Example 7 to remove the FMOC group. 

Step G . The deprotected resin (compound VII) is coupled to Fmoc-Lys(Tfa)-OH using 
the coupling method described in step C of Example 7 to give compound VIII. 

Steps H-K . The resin (compound VIII) is treated as in steps F-J in Example 7 to give 
compounds XI,.^. 
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EXAMPIF 13 

Preparation of a Set of Compounds 
of the Formula R,. J6 -Lys(e-jNIP)-ANP-5'-AH-ODN 

5 Figure 9 illustrates the parallel synthesis of a set of 36 T-L-X compounds 

(X = MOI, where MOI is a nucleic acid fragment, ODN) derived from the esters of 
Example 7 (the same procedure could be used with other T-L-X compounds wherein X 
is an activated ester). The MOI is conjugated to T-L through the 5' end of the MOI, via 
a phosphodiester - alkyleneamine group. 
1 0 Referring to Figure 9: 

Step_A. Compounds XII,, 6 are prepared according to a modified biotinyiation 
procedure in Van Ness et al., Nucleic Acids Res., 19, 3345 (1991). To a solution of one 
of the 5^-aminohexyl oligonucleotides (compounds XI,. 36 , 1 mg) in 200 mM sodium 
borate (pH 8.3, 250 mL) is added one of the Tetrafluorophenyl esters (compounds X, 36 
from Example A, 1 00-fold molar excess in 250 mL of NMP). The reaction is incubated 
overnight at ambient temperature. The unreacted and hydrolyzed tetrafluorophenyl 
esters are removed from the compounds XII,, by Sephadex G-50 chromatography. 



15 



20 



25 



30 



EXAMPT.F 14 
Preparation of a Set of Compounds 
of the Formula Ri-36-LYS(e-iNIP)-ANP-LYS(e-(MCT-5'-AH-ODN))-NH, 

> — 

Figure 10 illustrates the parallel synthesis of a set of 36 T-L-X 
compounds (X = MOI, where MOI is a nucleic acid fragment, ODN) derived from the 
amines of Example 1 1 (the same procedure could be used with other T-L-X compounds 
wherein X is an amine). The MOI is conjugated to T-L through the 5' end of the MOI, 
via a phosphodiester - alkyleneamine group. 

Referring to Figure 10: 
SteEA. The 5HH4,6-dichlonM,3,5-^^ ^ % 

are prepared as described in Van Ness et al., Nucleic Acids Res., 1 9, 3345 ( 1 99 1 ). 
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Ste P B- To a solution of one of the 5'-[6-(4,6-dichloro-1.3,5-triazin-2- 
ylamino)hexyl]oligonucleotides (compounds XII, . 36 ) at a concentration of 1 mg/ml in 
100 mM sodium borate (pH 8.3) was added a 100-fold molar excess of a primary amine 
selected from R,. 36 -Lys(e-iNIP)-ANP-Lys(e-NH 2 )-NHj (compounds X,. 36 from Example 
5 11). The solution is mixed overnight at ambient temperature. The unreacted amine is 
removed by ultrafiltration through a 3000 MW cutoff membrane (Amicon, Beverly, 
MA) using H 2 0 as the wash solution (3 X). The compounds XIII,. 36 are isolated by 
reduction of the volume to 100 mL. 

10 

EXAMPLE 15 

Demonstration of Sequencing Using a CE Separation Method, Collecting 
Fractions, Cleaving the Tag, Determining the Mass (and thus the Identity) of 
the Tag and then Deducing the Sequence. 

15 

In this example, two DNA samples are sequenced in a single separation 

method. 

CE Instrumentation 

20 The CE instrument is a breadboard version of the instrument available 

commercially from Applied Biosystems, Inc. (Foster City, CA). It consists of Plexiglas 
boxes enclosing two buffer chambers, which can be maintained at constant temperature 
with a heat control unit. The voltage necessary for electrophoresis is provided by a 
high-voltage power supply (Gamma High Voltage Research, Ormond Beach, FL) with a 

25 magnetic safety interlock, and a control unit to vary the applied potential. Sample 
injections for open tube capillaries are performed by use of a hand vacuum pump to 
generate a pressure differential across the capillary (vacuum injection). For gel-filled 
capillaries, samples are electrophoresed into the tube by application of an electric field 
(electrokinetic injection). 
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Preparation of fi d-Filled Cap illar^ 

Fifty-centimeter fused silica capillaries (375 mm o.d., 50 mm id 
Polymicro Technologies, Phoenix, AZ) with detector windows (where the polyimide' 
coatmg has been removed from the capillary) at 25 cm are used in the separations The 
5 mner surface of the capillaries are derivatized with 
(methyacryloxypropyDtrimethoxysilane (MAPS) (Sigma, St. Louis, MO) to permit 
covalent attachment of the gel to the capillary wall (Nashabeh et al., Anal CMem 
63:2US, 1994). Briefly, the capillaries are cleaned by successively flowing 
trifluoracetic acid, deionized water, and acetone through the column. After the acetone 
10 wash, 0.2o/ 0 solution of MAPS in 50/50 water/ethanol solution is passed through the 
capillary and left at room temperature for 30 min. The solution is removed by 
aspiration and the tubes are dried for 30 min under an infrared heat lamp. 

Gel-filled capillaries are prepared under high pressure by a modification , 
of the procedure described by Huang et al. (J. Chromatography 600:289, 1992) Four 
5 percent poly(acrylamide) gels with 5% cross linker and 8.3 urea are used for all the 
stud.es reported here. A stock solution is made by dissolving 3.8 g of acrylamide, 0 20 
g of RN'-methylenebis (acrylamide), and 50 g of urea into 100 mL of TBE buffer (90 
mM Tris borate, pH 8.3, 0.2 mM EDTA). Cross linking is initiated with 10 mL of 
N,N,N',N--tetramethylethylenediamine (TEMED) and 250 mL of 10K ammonium 
0 persulfate solution. The polymerizing solution is quickly passed into the derivatized 
column. Filled capillaries are then placed in a steel tube 1 x m x 1/8 in. i.d. x 1/4 in 
o.d. filled with water, and the pressure is raised to 400 bar by using a HPLC pump and 
maintained at that pressure overnight. The pressure is gradually released and the 
capillaries are removed. A short section of capillary from each end of the column is 
removed before use. 

Separation and Detection of PNA Fr»q m ^ 

Analysis of DNA sequencing reactions separated by conventional 
electrophoresis is performed on an ABI 370A DNA sequencer. This instrument uses a 
slab denaturing urea poly(acrylamide) gel 0.4 mm thick with a distance of 26 cm from 
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the sample well to the detection region, prepared according to the manufacturers 
instructions. DNA sequencing reactions are prepared as described by the manufacturer 
utilizing Taq polymerase (Promega Corp., Madison, WI) and are performed on 
M13mpl9 single-stranded DNA template prepared by standard procedures. Sequencing 
5 reactions are stored at -20°C in the dark and heated at 90°C for 3 min in formamide just 
prior to sample loading. They are loaded on the 370A with a pipetman according to the 
manufacturers instructions and on the CE by electrokinetic injection at 10,000 V for 10 
seconds. Ten ul fractions are collected during the run by removing the all the liquid at 
the bottom electrode and replacing it with new electolyte. 

10 To cleave the tag from the oligonucleotide, 100 ul of 0.05 M 

dithiothreitol (DTT) is added to each fraction. Incubation is for 30 minutes at room 
temperature. NaCl is then added to 0.1 M and 2 volumes of EtOH is added to 
precipitate the ODNs. The ODNs are removed from solution by centrifugation at 
14,000 x G at 4°C for 15 minutes. The supematents are reserved, dried to completeness 

15 under a vacuum with centrifugation. The pellets are then dissolved in 25 ul MeOH. 
The pellet is then tested by mass spectrometry for the presence of tags. The same 
MALDI technique is employed as described in Example 4. The following MWs (tags) 
are observed in the mass spectra as a function of time: 



25 



30 



Fraction # 

7 


Time 


MWs 


Fraction # 


Time 


MWs 


1 


1 


none 


31 


31 


212.1, 


2 


2 


none 


32 


32 


212.1, 199.1 


3 


3 


none 


33 


33 


212.1, 199.1 


4 


4 


none 


34 


34 


212.1; 200.1, 199.1, 












227.1 


5 


5 


none 


35 


35 


200.1,199.1,227.1 


6 


6 


none 


36 


36 


200.1,227.1 


7 


7 


none 


37 


37 


200.1,227.1, 179.18 


8 


8 


none 


38 


38 


200.1; 196.1, 179.18 


9 . 


9 


none 


39 


39 


200.1; 196.1, 179.18 



WO 97/27331 



129 



PCT/US97/01304 



1 f\ 

10 


10 


none 


40 


11 


11 


none 


41 


12 


12 


none 


42 


13 


13 


none 


43 


14 


14 


none 


44 


15 


15 


none 


45 



40 196.1,179.18,226.1 

41 196.1,226.1 

42 196.1; 182.1,226.1 

43 182.1,226.1,209.1 

44 182.1,209.1 

45 182.1; 235.2, 209.1, 
198.1 





1 /C 

Jo 


16 


none 


46 


46 


235.2, 198.1 




1 / 


17 


none 


47 


47 


235.2, 198.1 


in 


1 o 


18 


none 


48 


48 


235.2;, 198.1,218.1 






19 


none 


49 


49 


218.1 




20 


20 


none 


50 


50 


218.1 




21 


21 


none 


51 


51 


none 




22 


22 


none 


52 


52 


none 


15 


23 


23 


none 


53 


53 


none 




24 


24 


none 


54 


54 


none 




25 


25 


none 


55 


55 


none 




26 


26 


none 


56 


56 


none 




27 


27 


none 


57 


57 


none 


20 


28 


28 


none 


58 


58 


none ■ 




29 


29 


212.1 


59 


59 


none 




30 


30 


212.1 


60 


60 


none 



The temporal appearance of the tags for set #1 is 212.1 200 1 196 1 
25 182.1, 235.2, 218.1, 199.1, 227.1, and the temporal appearance of tags for set #2 is 
199.1, 227.1, ,79.1, 226.1, 209.1, 198.,. Since 212.1 amu indicates the 4- 
methoxybenzoic acid derivative, 200., indicates the 4-fluoroben2oic acid derivative 
196., amu indicates the to,uic acid derivative, ,82., amu indicates the benzoic acid 
denvative, 235.2 amu indicates the indo,e-3-acetic acid derivative, 2,8 , amu 
30 md 1C ates the 2,6-difluorobenzoic derivative, ,99., amu indicates the nicotinic acid N- 
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ox,de derivative, 227.1 amu indicates the 2-nitrobenzamide, 179.18 amu indicates the 
5-acetylsalicylic acid derivative, 226.1 amu indicates the 4-ethoxybenzoic acid 
denvative, 209.1 amu indicates the cinnamic acid derivative, and 198.1 amu indicates 
the 3-aminonicotinic acid, the first sequence can be deduced as -5'-TATGCA-3'- and 
the second sequence can be deduced as -5'-CGTACC-3'-. Thus, it is possible to 
sequence more than one DNA sample per separation step. 



EXAMPLF. 1* 
Demonstration of the Simultaneous Detection of 
Multiple Tags by Mass Spectrometry 

This example provides a description of the ability to simultaneously 
detect multiple compounds (tags) by mass spectrometry. In this particular example, 3 1 
compounds are mixed with a matrix, deposited and dried on to a solid support and then 
desorbed with a laser. The resultant ions are then introduced in a mass spectrometer. 

The following compounds (purchased from Aldrich, Milwaukee, WI) are 
mixed together on an equal molar basis to a final concentration of 0.002 M (on a per 
compound) basis: benzamide (121.14), nicotinamide (122.13), pyrazinamide (123.12), 
3-amino^-pyrazolecarboxylic acid (127.10), 2-thiophenecarboxamide (127.17), 4- 
aminobenzamide (135.15), tolumide (135.17), 6-methylnicotinamide (136.15), 3- 
aminonicotinamide (137.14), nicotinamide N-oxide (138.12), 3-hydropicolinamide 
(138.13), 4-fluorobenzamide (139.13), cinnamamide (147.18), 4-methoxybenzamide 

(151.17) , 2,6-difluorbenzamide (157.12), 4-amino-5-imidazole-carboxyamide (162.58), 
3,4-pyridine-dicarboxyamide (165.16), 4-ethoxybenzamide (165.19), 2,3- 
pyrazinedicarboxamide (166.14), 2-nitrobenzamide (166.14), 3-fluoro-4- 
methoxybenzoic acid (170.4), indole-3-acetamide (174.2), 5-acetylsalicylamide 

(179.18) , 3,5-dimethoxybenzamide (181.19), 1-naphthaleneacetamide (185.23), 8- 
chloro-3,5-diamino-2.pyrazinecarboxyamide (187.59), 4-trifluoromethyl-benzamide 
(189.00), 5-amino-5-phenyl-4-pyrazole-carboxamide (202.22), 1 -methyl-2-benzyl- 
malonamate (207.33), 4-amino-2,3,5,6-tetranuorobenzamide (208.11), 2,3- 
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napthienedicarboxylic acid (212.22). The compounds are placed in DMSO at the 
concentration described above. One ul of the material is then mixed with alpha-cyano- 
4-hydroxy cinnamic acid matrix (after a 1:10,000 dilution) and deposited on to a solid 
stainless steel support. 

5 The material is then desorbed by a laser using the Protein TOF Mass 

Spectrometer (Broker, Manning Park , MA) and the resulting ions are measured in both 
the hnear and reflect™ modes of openmon. The following m/z values are observed 
(Figure 11): 

10 121.1 — > benzamide (121.14) 
122.1 — > nicotinamide (122.13) 
123.1 — > pyrazinamide (123.12) 



124.1 
125.2 
15 127.3 — > 
127.2 — > 



20 137.1 — > 



3-amino-4-pyrazolecarboxylic acid (127.10) 

2- thiophenecarboxamide ( 1 27. 1 7) 
135.1 — > 4-aminobenzamide (135.15) 

135.1 — > tolumide (135.17) 

136.2 — > 6-methylnicotinamide (136.15) 

3- aminonicotinamide ( 1 3 7. 1 4) 
138,2 — > nicotinamide N-oxide (138.12) 
1 3 8.2 — > 3-hydropicolinamide (138.13) 
139.2 — > 4-fluorobenzamide (139.13) 
140.2 

25 147.3 — > cinnamamide (147.18) 
148.2 



149.2 
152.2 



30 



4-methoxybenzamide (151.17) 
2,6-difluorbenzamide (157.12) 
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158.3 

4-amino-5-imidazole-carboxyamide ( 1 62.58) 

163.3 

1 65.2 — > 3,4-pyridine-dicarboxyamide ( 1 65. 1 6) 
5 165.2 — > 4-ethoxybenzamide (165.19) 

1 66.2 > 2,3-pyrazinedicarboxamide ( 1 66. 14) 

1 66.2 — > 2-nitrobenzamide ( 1 66. 1 4) 

3-fluoro-4-methoxybenzoic acid (170.4) 

171.1 
10 172.2 
173.4 

indole-3-acetamide (174.2) 

178.3 

1 79.3 — > 5-acetylsalicylamide ( 1 79. 1 8) 

15 181 .2 — > 3,5-dimethoxybenzamide (181.19) 
182.2 — > 



186.2 



1-naphthaleneacetamide (185.23) 
8-chloro-3,5-diamino-2-pyrazinecarboxyamide (1 87.59) 

20 188.2 

189.2 — > 4-trifluoromethyl-benzamide (189.00) 

190.2 

191.2 

192.3 

25 5-amino-5-phenyl-4-pyrazole-carboxamide (202.22) 

203.2 
203.4 

l-methyl-2-benzyl-malonamate (207.33) 
4-amino-2,3,5,6-tetrafluorobenzamide (208.1 1) 
30 212.2 — > 2,3-napthlenedicarboxylic acid (212.22). 
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219.3 
221.2 
228.2 
234.2 
5 237.4 
241.4 



The data indicate that 22 of 3 1 compounds appeared in the spectrum with 
the anticipated mass, 9 of 3 1 compounds appeared in the spectrum with an + H mass (1 
10 atomic mass unit, amu) over the anticipated mass. The latter phenomenon is probably 
due to the protonation of an amine within the compounds. Therefore 31 of 31 
compounds are detected by MALDI Mass Spectroscopy. More importantly, the 
example demonstrates that multiple tags can be detected simultaneously by a 
spectroscopic method. 

The alpha-cyano matrix alone (Figure 12) gave peaks at 146 2 164 1 
172.1, 173.1, 189.1, 190.1, 191.1, 192.1, 212.1, 224.1, 228.0, 234.3. Other identified 
masses in the spectrum are due to contaminants in the purchased compounds as no 
effort was made to further purify the compounds. 



15 



20 



EXAMPLE 17 

A Procedure for Sequencing with MW-Identifier-Labeled Primers 
Radiolabeled Pr IME rs, MW-Identifier-Labeled-Dideoxv-Terminators, 
Fluorescent-Primers and Fluorescent-Dideoxy-Terminators 



25 A. 



Preparation seouencinp pels and electrophoresis 

The protocol is as follows. Prepare 8 M urea, polyacrylamide gels 
according to the following recipes (1 00 ml) for 4%, 6%, or 8% polyacrylamide. 
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4% 


5% 


6% 


urea 


48 g 


48 


48 g 


40% 


10 ml 


12.5 ml 


15 ml 


acrylamide/bisacrylamide 








10XMTBE 


10 ml 


10 ml 


10 ml 


ddH 2 0 


42 ml 


39.5 ml 


37 ml 


15% APS 


500 ul 


500 Ml 


500 Ml 


TEMED 


50 ul 


50 ul 


50 Ml 



Urea (5505UA) is obtained from Gibco/BRL (Gaithersburg, MD). Ail 
other materials are obtained from Fisher (Fair Lawn, NJ). Briefly, urea, MTBE buffer 
and water are combined, incubated for 5 minutes at 55°C, and stirred to dissolve the 
5 urea. The mixture is cooled briefly, acrylamide/bis-acrylamide solution is added and 
mixed, and the entire mixture is degassed under vacuum for 5 minutes. APS and 
TEMED polymerization agents are added with stirring. The complete gel mix is 
immediately poured in between the taped glass plates with 0.15 mm spacers. Plates are 
prepared by first cleaning with ALCONOX™ (New York, NY) detergent and hot water, 

10 are rinsed with double distilled water, and dried. Typically, the notched glass plate is 
treated with a silanizing reagent and then rinsed with double distilled water. After 
pouring, the gel is immediately laid horizontally, the well forming comb is inserted, 
clamped into place, and the gel allowed to polymerize for at least 30 minutes. Prior to 
loading, the tape around the bottom of the gel and the well-forming comb is removed. 

1 5 A vertical electrophoresis apparatus is then assembled by clamping the upper and lower 
buffer chambers to the gel plates, and adding IX MTBE electrophoresis buffer to the 
chambers. Sample wells are flushed with a syringe containing running buffer, and 
immediately prior to loading each sample, the well is flushed with running buffer using 
gel loading tips to remove urea. One to two microliters of sample is loaded into each 

20 well using a Pipetteman (Rainin, Emeryville, CA) with gel-loading tips, and then 
electrophoresed according the following guidelines (during electrophoresis, the gel is 
cooled with a fan): 
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10 



15 



20 



25 



termination electrophoresis 

Ihorl soy nTr™ P°J: acrvlamidp M conditions 

short 5%, 0. 1 5 mm x 50 cm x 20 cm 2 25 hours at 9? mA 

J2SS "/p. 0,1 5 mm x 70 cm x 20 cm 20-24 hour. „ l 5 ~a 

Each base-specific sequencing reaction terminated (with the short 
termination) mix is loaded onto a 0.15 mm x 50 em x 20 cm denaturing 5% 
polyacrylamide gel; reactions terminated with the long termination mix typically are 
d.v,ded in half and loaded onto two 0.15 mm x 70 cm x 20 cm denaturing 4% 
polyacrylamide gels. 

After electrophoresis, buffer is removed from the wells, the tape is 
removed, and the gel plates separated. The gel is transferred to a 40 cm x 20 cm sheet 
of 3MM Whatman paper, covered with plastic wrap, and dried on a Hoefer (San 
Franco, CA) gel dryer for 25 minutes at 80°C. The dried gel is exposed to Koda/ 
(New Haven, CT) XRP-1 film . Depending on the intensity of the signal and whether 
the radiolabel is »P or »S, exposure times vary from 4 hours to several days After 
exposure, films are developed by processing in developer and fixer solutions, rinsed 
with water, and air dried. The autoradiogram is then placed on a light-box, the* 
sequence is manually read, and the data typed into a computer. 

Taq-polymerase catalyzed cycle sequencing using labeled primers. EaclT 
base-specific cycle sequencing reaction routinely included approximately 1 00 or ?00 ng 
isolated single-stranded DNA for A and C or G and T reactions, respectively. Double- 
stranded cycle sequencing reactions similarly contained approximately 200 or 400 ng of 
plasmid DNA isolated using either the standard alkaline lysis or the diatomaceous earth- 
m ° d,fled a,kaHne ,ysis P rOCcdures - AH reagents except template DNA are added in one 
P-pettrng step from a premix of previously aliquoted stock solutions stored at -20°C 
Reaction premixes are prepared by combining reaction buffer with the base-specific 
nucleotide mixes. Prior to use, the base-specific reaction premixes are thawed and 
combmed with diluted Tag DNA polymerase and the individual end-labeled universal 
primers to yield the final reaction mixes. Once the above mixes are prepared four 
ahquots of single or double-stranded DNA are pipetted into the bottom of each 0 2 ml 
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thin-walled reaction tube, corresponding to the A, C, G, and T reactions, and then an 
aliquot of the respective reaction mixes is added to the side of each tube. These tubes 
are part of a 96-tube/retainer set tray in a microtiter plate format, which fits into a 
Perkin Elmer Cetus Cycler 9600 (Foster City, CA). Strip caps are sealed onto the 

5 tube/retainer set and the plate is centrifuged briefly. The plate then is placed in the 
cycler whose heat block had been preheated to 95°C, and the cycling program 
immediately started. The cycling protocol consists of 15-30 cycles of: 95°C 
denaturation; 55°C annealing; 72°C extension; 95°C denaturation; 72°C extension; 
95°C denaturation, and 72°C extension, linked to a 4°C final soak file. 

10 At this stage, the reactions may be frozen and stored at -20°C for up to 

several days. Prior to pooling and precipitation, the plate is centrifuged briefly to 
reclaim condensation. The primer and base-specific reactions are pooled into ethanol, 
and the precipitated DNA is collected by centrifugation and dried. These sequencing 
reactions could be stored for several days at -20°C. 

15 The protocol for the sequencing reactions is as follows. For A and C 

reactions, 1 |il, and for G and T reactions, 2 \xl of each DNA sample (100 ng/ul for Ml 3 
templates and 200 ng/ul for pUC templates) are pipetted into the bottom of the 0.2 ml 
thin-walled reaction tubes. AmpliTaq polymerase (N80 1-0060) is from Perkin-Elmer 
Cetus (Foster City, CA). 

20 A mix of 30 \xl AmpliTaq (5U/jil), 30 |il 5X Taq reaction buffer, 130 nl 

ddH20, and 190 \il diluted Taq for 24 clones is prepared. 

A, C, G, and T base specific mixes are prepared by adding base-specific 
primer and diluted Taq to each of the base specific nucleotide/buffer premixes: 



A,C/G,T 
60/120 *il 
30/60 *il 
30/60 \xl 



5X Taq cycle sequencing mix 
diluted Taq polymerase 
respective fluorescent end- 
labeled primer 



120/240 \x\ 



25 
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B - Taq-polymerase catalyzed eyrie sequencing usinp NQMdgn tjfierja^ 
terminator reactions 

One problem in DNA cycle sequencing is that when primers are used the 
reaction conditions are such that the nested fragment set distribution is highly 
5 dependent upon the template concentration in the reaction mix. It has been recently 
observed that the nested fragment set distribution for the DNA cycle sequencing 
reactions using the labeled terminators is much less sensitive to DNA concentration 
than that obtained with the labeled primer reactions as described above. In addition, the 
terminator reactions require only one reaction tube per template while the labeled 
1 0 primer reactions require one reaction tube for each of the four terminators. The protocol 
described below is easily interfaced with the 96 well template isolation and 96 well 
reaction clean-up procedures also described herein. 

Place 0.5 ug of single-stranded or 1 ug of double-stranded DNA in 0 2 
ml PCR tubes. Add 1 ul (for single stranded templates) or 4 ul (for double-stranded 
15 templates) of 0.8 uM primer and 9.5 ul of ABI supplied premix to each tube, and bring 
the final volume to 20 pi with ddH 2 0. Centrifuge briefly and cycle as usual using the 
terminator program as described by the manufacturer (i.e., preheat at 96°C followed by 
25 cycles of 96°C for 15 seconds, 50°C for 1 second, 60°C for 4 minutes, and then link, 
to a 4°C hold). Proceed with the spin column purification using either the Centri-Sep 
20 columns (Amicon, Beverly, MA) or G-50 microtiter plate procedures given below. 

C \ Terminator Reaction Cle^m.T i p via Centri-g^ n^ vmn , 

A column is prepared by gently tapping the column to cause the gel 
material to settle to the bottom of the column. The column stopper is removed and 0.75 

25 ml dRO is added. Stopper the column and invert it several times to mix. Allow the gel 
to hydrate for at least 30 minutes at room temperature. Columns can be stored for a few 
days at 4°C. Allow columns that have been stored at 4°C to warm to room temperature 
before use. Remove any air bubbles by inverting the column and allowing the gel to 
settle. Remove the upper-end cap first and then remove the lower-end cap. Allow the 

30 column to drain completely, by gravity. (Note: If flow does not begin immediately 
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apply gentle pressure to the column with a pipet bulb.) Insert the column into the wash 
tube provided. Spin in a variable-speed microcentrifuge at 1300 xg for 2 minutes to 
remove the fluid. Remove the column from the wash tube and insert it into a sample 
collection tube. Carefully remove the reaction mixture (20 ^1) and load it on top of the 
gel material. If the samples were incubated in a cycling instrument that required 
overlaying with oil, carefully remove the reaction from beneath the oil. Avoid picking 
up oil with the sample, although small amounts of oil (<1 nl) in the sample will not 
affect results. Oil at the end of the pipet tip containing the sample can be removed by 
touching the tip carefully on a clean surface (e.g., the reaction tube). Use each column 
only once. Spin in a variable-speed microcentrifuge with a fixed angle rotor, placing 
the column in the same orientation as it was in for the first spin. Dry the sample in a 
vacuum centrifuge. Do not apply heat or over dry. If desired, reactions can be 
precipitated with ethanoL 

D. Terminator Reaction Clean-Up via Sephadex G-50 Filled Microtiter Format 
Filter Plates 

Sephadex (Pharmacia, Piscataway, NJ) settles out; therefore, you must 
resuspend before adding to the plate and also after filling every 8 to 10 wells. Add 400 
III of mixed Sephadex G-50 to each well of microtiter filter plate. Place microtiter filter 
plate on top of a microtiter plate to collect water and tape sides so they do not fly apart 

7 

during centrifugation. Spin at 1 500 rpm for 2 minutes. Discard water that has been 
collected in the microtiter plate. Place the microtiter filter plate on top of a microtiter 
plate to collect water and tape sides so they do not fly apart during centrifugation. Add 
an additional 100-200 jil of Sephadex G-50 to fill the microtiter plate wells. Spin at 
1500 rpm for 2 minutes. Discard water that has been collected in the microtiter plate. 
Place the microtiter filter plate on top of a microtiter plate with tubes to collect water 
and tape sides so they do not fly apart during centrifugation. Add 20 (il terminator 
reaction to each Sephadex G-50 containing wells. Spin at 1 500 rpm for 2 minutes. 
Place the collected effluent in a Speed- Vac for approximately 1-2 hours. 
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Sequenase™ (UBS, Cleveland OH) catalyzed sequencing with labeled 
terminators. Single-stranded terminator reactions require approximately 2 ug of phenol 
extracted M13-based template DNA. The DNA is denatured and the primer annealed 
by incubating DNA, primer, and buffer at 65°C. After the reaction cooled to room 
5 temperature, alpha-thio-deoxynucleotides, labeled terminators, and diluted Sequenase 
TM DNA polymerase are added and the mixture is incubated at 37°C. The reaction is 
'stopped by adding ammonium acetate and ethanol, and the DNA fragments are 
precipitated and dried. To aid in the removal of unincorporated terminators, the DNA 
pellet is rinsed twice with ethanol. The dried sequencing reactions could be stored up to 
10 several days at -20°C. 

Double-stranded terminator reactions required approximately 5 ug of 
diatomaceous earth modified-alkaline lysis midi-prep purified plasmid DNA. The 
double-stranded DNA is denatured by incubating the DNA in sodium hydroxide at 
65°C, and after incubation, primer is added and the reaction is neutralized by adding an 
1 5 acid-buffer. Reaction buffer, alpha-thio-deoxynucleotides, labeled dye-terminators, and 
diluted Sequenase TM DNA polymerase then are added and the reaction is incubated at 
37°C. Ammonium acetate is added to stop the reaction and the DNA fragments are. 
precipitated, rinsed, dried, and stored. 

20 For Single-stranded reactions: 

Add the following to a 1.5 ml microcentrifuge tube: 

4 ul ss DNA (2 ug) 

4 Ml 0.8 uM primer 

2 pi lOx MOPS buffer 

2 pi 1 Ox Mn 2 7isocitrate buffer 

12 pi 

To denature the DNA and anneal the primer, incubate the reaction at 
25 65°C-70°C for 5 minutes. Allow the reaction to cool at room temperature for 15 
minutes, and then briefly centrifuge to reclaim condensation. To each reaction, add the 
following reagents and incubate for 10 minutes at 37°C. 
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7 \xl ABI terminator mix (Catalogue 

No. 401489) 
2 nl diluted Sequenase TM (3.25 \J/^\) 

1 jil 2 mM a-S dNTPs 

22 *il 

The undiluted Sequenase TM (Catalogue No. 70775, United States 
Biochemicals, Cleveland, OH) is 13 U/^il and is diluted 1:4 with USB dilution buffer 
prior to use. Add 20 \x\ 9.5 M ammonium acetate and 100 jd 95% ethanol to stop the 
reaction and mix. 

Precipitate the DN A in an ice-water bath for 1 0 minutes. Centrifuge for 
15 minutes at 10,000 xg in a microcentrifuge at 4°C. Carefully decant the supernatant, 
and rinse the pellet by adding 300 \xl of 70-80% ethanol. Mix and centrifuge again for 
1 5 minutes and carefully decant the supernatant. 

Repeat the rinse step to insure efficient removal of the unincorporated 
terminators. Dry the DNA for 5-10 minutes (or until dry) in the Speed-Vac, and store 
the dried reactions at -20°C. 

For double-stranded reactions: 

Add the following to a 1.5 ml microcentrifuge tube: 

5 \il ds DNA (5 ixg) 

4\xl IN NaOH 

3 nl ddH 2 0 

Incubate the reaction at 65°C-70°C for 5 minutes, and then briefly 
centrifuge to reclaim condensation. Add the following reagents to each reaction, vortex, 
and briefly centrifuge: 



3 id 
9^1 
4\x\ 



8 primer 
ddH.O 

MOPS-Acid buffer 
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To each reaction, add the following reagents and incubate for 10 minutes 

at37°C. 



4 Ml 1 OX Mn 2 7isocitrate buffer 

6 txl ABI terminator mi 

2 ill diluted Sequenase TM (3 25 

U/ul) 

1 Ml 2 mM [alpha]-S-dNTPs 

22 Ml 

5 The undiluted SEQUENASE- from United States Biochemicals is 13 

U/m1 and should be diluted 1 :4 with USB dilution buffer prior to use. Add 60 M l 8 M 
ammonium acetate and 300 m1 95% ethanol to stop the reaction and vortex. Precipitate 
the DNA in an ice-water bath for 10-minutes. Centrifuge for IS.minutes at 10 000 xg in 
a microcentrifuge at 4°C. Carefully decant the supernatant, and rinse the pellet by 
10 adding 300 m. of SOo/o ethanol. Mix the sample and centrifuge again for 15 minutes, and 
carefuHy decant the supernatant. Repeat the rinse step to insure efficient removal of the 
umncorporated terminators. Dry the DNA for 5-10 minutes (or until dry) in the Speed- 
Vac. 



5 E - ^"ence Pel preparation, pre-elmrophoresis. s^n.. loadin g , eJe^aSB horssis, 
data collection, and anal™. ™ the ART W a q Na sequencer 

Polyacrylamide gels for DNA sequencing are prepared as described 
above, except that the gel mix is filtered prior to polymerization. Glass plates are 
carefully cleaned with hot water, distilled water, and ethanol to remove potential 
3 fluorescent contaminants prior to taping. Denaturing 6% polyacrylamide gels are 
poured into 0.3 mm x 89 cm x 52 cm taped plates and fitted with a 36 well comb. After 
polymerization, the tape and the comb are removed from the gel and the outer surfaces 
of the glass plates are cleaned with hot water, and rinsed with distilled water and 
ethanol. The gel is assembled into an ABI sequencer, and the checked by laser- 
scanning. If baseline alterations are observed on the ABI-associated Macintosh 
computer display, the plates are recleaned. Subsequently, the buffer wells are attached 
electrophoresis buffer is added, and the gel is pre-electro P horesed for ,0-30 minutes * 
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30 W. Prior to sample loading, the pooled and dried reaction products are resuspended 
in formamide/EDTA loading buffer by vortexing and then heated at 90°C. A sample 
sheet is created within the ABI data collection software on the Macintosh computer 
which indicates the number of samples loaded and the fluorescent-labeled mobility file 
5 to use for sequence data processing. After cleaning the sample wells with a syringe, the 
odd-numbered sequencing reactions are loaded into the respective wells using a 
micropipettor equipped with a flat-tipped gel-loading tip. The gel is then 
electrophoresed for 5 minutes before the wells are cleaned again and the even numbered 
samples are loaded. The filter wheel used for dye-primers and dye-terminators is 

10 specified on the ABI 373 A CPU. Typically electrophoresis and data collection are for 
10 hours at 30W on the ABI 373 A that is fitted with a heat-distributing aluminum plate. 
After data collection, an image file is created by the ABI software that relates the 
fluorescent signal detected to the corresponding scan number. The software then 
determines the sample lane positions based on the signal intensities. After the lanes are 

15 tracked, the cross-section of data for each lane are extracted and processed by baseline 
subtraction, mobility calculation, spectral deconvolution, and time correction. After 
processing, the sequence data files are transferred to a SPARCstation 2 using NFS 
Share. 

Protocol: prepare 8 M urea, 4.75% polyacrylamide gels, as described 
20 above, using a 36-well comb. Prior to loading, clean the outer surface of the gel plates. 
Assemble the gel plates into an ABI 373A DNA Sequencer (Foster City, CA) so that 
the lower scan (usually the blue) line corresponds to an intensity value of 800-1000 as 
displayed on the computer data collection window. If the baseline of four-color scan 
lines is not flat, reclean the glass plates. Affix the aluminum heat distribution plate. 
25 Pre-electrophorese the gel for 10-30 minutes. Prepare the samples for loading. Add 3 
pi of FE to the bottom of each tube, vortex, heat at 90°C for 3 minutes, and centrifuge 
to reclaim condensation. Flush the sample wells with electrophoresis buffer using a 
syringe. Using flat-tipped gel loading pipette tips, load each odd-numbered sample. 
Pre-electrophorese the gel for at least 5 minutes, flush the wells again, and then load 
30 eabh even-numbered sample. Begin the electrophoresis (30 W for 10 hours). After data 



WO 97/27331 



143 



PCT/US97/01304 



collection, the ABI software will automatically open the data analysis software, which 
will create the imaged gel file, extract the data for each sample lane, and process the 
data. 



5 F . Do "Me-stranded sequencing of rDNA clnn~ gontaj n iag Innc tails 
using anchor ed polvrdT) r ri m ~~ 

Double-stranded templates of cDNAs containing long poly(A) tracts are 
difficult to sequence with vector primers which anneal downstream of the poly(A) tail 
Sequencing with these primers results in a long poly(T) ladder followed by a sequence 
10 which may be difficult to read. To circumvent this problem, three primers which 
contam (dT) l7 and either (dA) or (dC) or (dG) at the 3' end were designed to 'anchor" the 
pnmers and allow sequencing of the region immediately upstream of the poly(A) 
region. Using this protocol, over 300 bp of readable sequence could be obtained. The 
sequence of the opposite strand of these cDNAs was determined using insert-specific 
15 primers upstream of the pol y (A) region. The ability to directly obtain sequence 
immediately upstream from the poly(A) tail of cDNAs should be of particular 
importance to large scale efforts to generate sequence-tagged sites (STSs) from cDNAs 
The protocol is as follows. Synthesize anchored poly (dT) I7 with 
anchors of (dA) or (dC) or (dG) at the 3' end on a DNA synthesizer and use after 
20 punfication on Oligonucleotide Purification Cartridges (Amicon, Beverly, MA). For 
sequencing with anchored primers, denature 5-10 ug of plasmid DNA in a total volume 
of 50 ul containing 0.2 M sodium hydroxide and 0.16 mM EDTA by incubation at 65°C 
for 10 minutes. Add the three poly(dT) anchored primers (2 pmol of each) and 
immediately place the mixture on ice. Neutralize the solution by adding 5 ml of 5 M 
25 ammonium acetate, pH 7.0. 

Precipitate the DNA by adding 150 ul of cold 95% ethanol and wash the 
pellet twice with cold 70% ethanol. Dry the pellet for 5 minutes and then resuspend in 
MOPS buffer. Anneal the primers by heating the solution for 2 minutes at 65°C 
followed by slow cooling to room temperature for 15-30 minutes. Perform sequencing 
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reactions, using modified T7 DNA polymerase and <x-[ 32 P]dATP (> 1000 Ci/mmole) 
using the protocol described above. 

G. cDN A sequencine based on PCR and random shotgun cloning 
5 The following is a method for sequencing cloned cDNAs based on PCR 

amplification, random shotgun cloning, and automated fluorescent sequencing. This 
PCR-based approach uses a primer pair between the usual "universal" forward and 
reverse priming sites and the multiple cloning sites of the Stratagene Bluescript vector. 
These two PCR primers, with the sequence 5'-TCGAGGTCGACGGTATCG-3' (Seq. 

10 ID No. 15) for the forward or -16bs primer and 5-GCCGCTCTAGAACTAG TG-3' 
(Seq. ID No. 16) for the reverse or +19bs primer, may be used to amplify sufficient 
quantities of cDNA inserts in the 1.2 to 3.4 kb size range so that the random shotgun 
sequencing approach described below could be implemented. 

The following is the protocol. Incubate four 100 pi PCR reactions, each 

15 containing approximately 100 ng of plasmid DNA, 100 pmoles of each primer, 50 mM 
KC1, 10 mM Tris-HCi pH 8.5, 1.5 mM MgCl 2 , 0.2 mM of each dNTP, and 5 units of 
PE-Cetus Amplitaq in 0.5 ml snap cap tubes for 25 cycles of 95°C for 1 minute, 55°C 
for 1 minute and 72°C for 2 minutes in a PE-Cetus 48 tube DNA Thermal Cycler. After 
pooling the four reactions, the aqueous solution containing the PCR product is placed in 

20 an nebulizer, brought to 2.0 ml by adding approximately 0.5 to 1.0 ml of glycerol, and 
equilibrated at -20°C by placing it in either an isopropyl alcohol/dry ice or saturated 
aqueous NaCl/dry ice bath for 10 minutes. The sample is nebulized at -20°C by 
applying 25 - 30 psi nitrogen pressure for 2.5 min. Following ethanol precipitation to 
concentrate the sheared PCR product, the fragments were blunt ended and 

25 phosphorylated by incubation with the Klenow fragment of E. coli DNA polymerase 
and T4 polynucleotide kinase as described previously. Fragments in the 0.4 to 0.7 kb 
range were obtained by elution from a low melting agarose gel. 

From the foregoing, it will be appreciated that, although specific 
30 embodiments of the invention have been described herein for purposes of illustration. 



WO 97/27331 

PCT/US97/01304 

145 

various modifications may be made without deviating from the spirit and scope of the 
invention. Accordingly, the invention is not limited except as by the appended claims. 
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(ii) TITLE OF INVENTION: METHODS AND COMPOSITIONS FOR DETERMINING 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 : 
TGTAAAACGA CGGCCAGT 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
TGTAAAACGA CGGCCAGTA 
(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
TGTAAAACGA CGGCCAGTAT 
(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 
TGTAAMCGA C6GCCAGTAT G 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 
TGTAAMCGA CGGCCAGTAT GC 
(2) INFORMATION FOR SEQ ID N0:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6 
TGTAAAACGA CGGCCAGTAT GCA 
(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TGTAAAACGA CGGCCAGTAT GCAT 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY, linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
TGTAAAACGA CGGCCAGTAT GCATG 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY- linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 
TGTAAAACGA CGGCCACG 
(2) INFORMATION FOR SEQ ID NO: 10; 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEO ID N0:10: 
TGTAAAACGA CGGCCAGCG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TGTAAAACGA CGGCCAGCGT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TGTAAAACGA CGGCCAGCGT A 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13: 
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TGTAAAACGA CGGCCAGCGT AC 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 14: 
TGTAAAACGA CGGCCAGCGT ACC 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
TCGAGGTCGA CGGTATCG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCCGCTCTAG AACTAGTG 

18 
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Claims 

We Claim: 

1 . A method for determining the sequence of a nucleic acid molecule, 

comprising: 

(a) generating tagged nucleic acid fragments which are 
complementary to a selected target nucleic acid molecule, wherein a tag is correlative 
with a particular nucleotide and detectable by non-fluorescent spectrometry or 
potentiometry; 

(b) separating the tagged fragments by sequential length; 

(c) cleaving the tags from the tagged fragments; and 

(d) detecting the tags by non-fluorescent spectrometry or 
potentiometry, and therefrom determining the sequence of the nucleic acid molecule. 

2. The method according to claim 1 wherein the detection of the tags 
is by mass spectrometry, infrared spectrometry, ultraviolet spectrometry or potentiostatic 
amperometry. 

3. The method according to claims 1 or 2 wherein the tagged 
fragments are separated in step (b) by a method selected from gel electrophoresis, 
capillary electrophoresis, micro-channel electrophoresis and HPLC. 

4. The method according to claims I or 2 wherein the tagged 
fragments are cleaved in step (c) by a method selected from oxidation, reduction, acid- 
labile, base-labile, enzymatic, electrochemical, thermal, thiol exchange and photolabile 
methods. 

5. The method according to claim 2 wherein the tags are detected by 
time-of-flight mass spectrometry, quadrupole mass spectrometry, magnetic sector mass 
spectrometry or electric sector mass spectrometry. 
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6. The method according to claim 2 wherein the tags are detected by 
coulometric detectors or amperometric detectors. 

7. The method according to claims 1 or 2 wherein the tagged nucleic 
acid fragments are generated in step (a) from a 5' terminus to a 3' terminus. 

8. The method according to claims 1 or 2 wherein step (a) generates 
more than four of the tagged nucleic acid fragments and each tag is unique for a nucleic 
acid fragment. 



9. The method according to claims 1 or 2 wherein steps (b), (c) and 
(d) are performed in a continuous manner. 



10. The method according to claims 1 or 2 wherein steps (b), (c) and 
(d) are performed in a continuous manner on a system. 

11. The method according to claims 1 or 2 wherein one or more of the 
steps is automated. 



12. The method according to claims 1 or 2 wherein the tagged 
fragments are generated from oligonucleotide primers that are conjugated to a tag at other 
than the 3' end of the primer. 

13. The method according to claims 1 or 2 wherein the tagged 
fragments are generated from tagged dideoxynucleotide terminators. 

1 4. The method according to claims 1 or 2 wherein at least one tagged 
nucleic acid fragment is a compound according to any one of claims 1 5 to 33. 

15. A compound of the formula: 
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T^-L-X 

wherein, 

T" 5 is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

X is a functional group selected from hydroxyl, amino, thiol, carboxylic 
acid, haloalkyl, and derivatives thereof which either activate or inhibit the activity of the 
group toward coupling with other moieties, or is a nucleic acid fragment attached to L at 
other than the 3* end of the nucleic acid fragment; 

with the provisos that the compound is not bonded to a solid support 
through X nor has a mass of less than 250 daltons. 

1 6. A compound according to claim 1 5 wherein T 15 has a mass of from 
15 to 10,000 daltons and a molecular formula of C K500 N 0 . l00 O 0 . 100 S^ l0 P 0 . 10 H a F p I 6 wherein 
the sum of a, p and 5 is sufficient to satisfy the otherwise unsatisfied valencies of the C, 
N, O, P and S atoms. 

17. A compound according to claim 15 wherein T 15 and L are bonded 
together through a functional group selected from amide, ester, ether, amine, sulfide, 
thioester, disulfide, thioether, urea, thiourea, carbamate, thiocarbamate, Schiff base, 
reduced Schiff base, imine, oxime, hydrazone, phosphate, phosphonate, phosphoramide, 
phosphonamide, sulfonate, sulfonamide or carbon-carbon bond. 

18. A compound according to claim 1 7 wherein the functional group is 
selected from amide, ester, amine, urea and carbamate. 
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19. A compound according to claim 1 5 wherein L is selected from L hu , 
L MI , L e , V°\ L ,RJ , L CTI , L dc , L A and L", where actinic radiation, acid, base, oxidation, 
reduction, enzyme, electrochemical, thermal and thiol exchange, respectively, cause the 
r" s -containing moiety to be cleaved from the remainder of the molecule. 

20. A compound according to claim 19 wherein L hu has the formula 
L'-L 2 -L\ wherein L 2 is a molecular fragment that absorbs actinic radiation to promote the 
cleavage of T» from X, and L' and L 3 are independently a direct bond or an organic 
moiety, where L 1 separates L 2 from T» and L 3 separates L 2 from X, and neither L 1 nor L 3 
undergo bond cleavage when L 2 absorbs the actinic radiation. 

21. A compound according to claim 20 wherein -L 2 -L 3 has the 

formula: 




NO, 



with one carbon atom at positions a, b, c, d or e being substituted with -L 3 - 
X and optionally one or more of positions b, c, d or e being substituted with alkyl, 
alkoxy, fluoride, chloride, hydroxyl, carboxylate or amide; and R' is hydrogen or 
hydrocarbyl. 



22. A compound according to claim 2 1 wherein X is — C— R 2 R 2 

O 

is -OH or a group that either protects or activates a carboxylic acid for coupling with 
another moiety. 
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23. A compound according to claim 20 wherein L 3 is selected from a 
direct bond, a hydrocarbylene, -O-hydrocarbylene, and hydrocarbylene-(0 
hydrocarbylene) n -H, and n is an integer ranging from 1 to 10. 

24. A compound according to claim 15 wherein -L-X has the formula: 




wherein one or more of positions b, c, d or e is substituted with hydrogen, 
alkyl, alkoxy, fluoride, chloride, hydroxyl, carboxylate or amide; and R 1 is hydrogen or 
hydrocarbyl. 

25. A compound according to claim 15 wherein T* 11 * has the formula: 

T 2 -<J-T 3 -) n - 

T 2 is an organic moiety formed from carbon and one or more of hydrogen, 
fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass of 15 to 500 
daltons; 

T 3 is an organic moiety formed from carbon and one or more of hydrogen, 
fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass of 50 to 1000 
daltons; 

J is a direct bond or a functional group selected from amide, ester, amine, 
sulfide, ether, thioester, disulfide, thioether, urea, thiourea, carbamate, thiocarbamate, 
Schiff base, reduced Schiff base, imine, oxime, hydrazone, phosphate, phosphonate, 
phosphoramide, phosphonamide, sulfonate, sulfonamide or carbon-carbon bond; and 

n is an integer ranging from 1 to 50, and when n is greater than 1, each T 3 
and J is independently selected. 



WO 97/27331 



PCT/US97/01304 



157 



26. A compound according to claim 25 wherein T 2 is selected from 
hydrocarbyl, hydrocarbyl-O-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, 

hydrocarbyl-NH-hydrocarbylene, hydrocarbyl-amide-hydrocarbylene, N - 

(hydrocarbyl)hydrocarbylene, N.N^liChydrocarbyDhydrocarbylene, hydrocarbylacyl- 
hydrocarbylene, heterocyclylhydrocarbyl wherein the heteroatom(s) are selected from 
oxygen, nitrogen, sulfiir and phosphorus, substituted heterocyclylhydrocarbyl wherein the 
heteroatom(s) are selected from oxygen, nitrogen, sulfur and phosphorus and the 
substituents are selected from hydrocarbyl, hydrocarbyl-O-hydrocarbylene, hydrocarbyl- 
NH-hydrocarbylene, hydrocarbyl-S-hydrocarbylene, N-(hydrocarbyl)hydrocarbylene 
N,N-di(hydrocarbyl)hydrocarbylene and hydrocarbylacyl-hydrocarbylene, as well as 
derivatives of any of the foregoing wherein one or more hydrogens is replaced with an 
equal number of fluorides. 

27. A compound according to claim 25 wherein T 3 has the formula - 
G(R 2 )- , G is C M alkylene having a single R 2 substituent, and R 2 is selected from alkyl 
alkenyl, alkynyl, cycloalkyl, aryl-fused cycloalkyl, cycloalkenyl, aryl, aralkyl 
aryl-substituted alkenyl or alkynyl, cycloalkyl-substituted alkyl, cycloalkenyl-substituted 
cycloalkyl, biaryl, alkoxy, alkenoxy, alkynoxy, aralkoxy, aryl-substituted alkenoxy or 
alkynoxy, alkylamino, alkenylamino or alkynylamino, aryl-substituted alkylamino 
aryl-substituted alkenylamino or alkynylamino, aryloxy, arylamino 
N-alkylurea-substituted alkyl, N-arylurea-substituted alkyl 

alkylcarbonylamino-substituted alkyl. aminocarbonyl-substituted alkyl, heterocyclyl 
heterocyclyl-substituted alkyl, heterocyclyl-substituted amino, carboxyalkyl substituted 
aralkyl, oxocarbocyclyl-fused aryl and heterocyclylalky,; cycloalkenyl. aryl-substituted 
alkyl and, aralkyl, hydroxy-substituted alkyl, alkoxy-substituted alkyl, aralkoxy- 
substituted alkyl, alkoxy-substituted alkyl, aralkoxy-substituted alkyl, amino-substituted 
alkyl, (aryl-substituted alkyloxycarbonylamino)-substituted alkyl, thiol -substituted alkyl 
alkylsulfonyl-substituted alkyl, (hydroxy-substituted alkylthio)-substituted alkyl 
thioalkoxy-substituted alkyl, hydrocarbylacylamino-substituted alky , 

heterocyclylacylamino-substituted alkyl, hydrocarbyl-substituted-heterocyclylacylamino- 
substituted alkyl, alkylsulfonylamino-substituted alkyl, arylsulfonylamino-substituted 
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alkyl, morpholino-alkyl, thiomorpholino-alkyl, morpholiho carbonyl-substituted alkyl, 
thiomorpholinocarbonyl-substituted alkyl, [N-(alkyl, alkenyl or alkynyl)- or N,N-[dialkyl, 
dialkenyl, dialkynyl or (alkyl, alkenyl)-amino]carbonyl-substituted alkyl, 
heterocyclylaminocarbonyl, heterocylylalkyleneaminocarbonyl, 
heterocyclylaminocarbonyl-substituted alkyl, heterocylylalkyleneaminocarbonyl- 
substituted alkyl, N,N-[dialkyl]alkyleneaminocarbonyl, N,N- 

[dialkyl]alkyleneaminocarbonyl-substituted alkyl, alkyl-substituted heterocyclylcarbonyl, 
alkyl-substituted heterocyclylcarbonyl-alkyl, carboxyl-substituted alkyl, dialkylamino- 
substituted acylaminoalkyl and amino acid side chains selected from arginine, asparagine, 
glutamine, S-methyl cysteine, methionine and corresponding sulfoxide and sulfone 
derivatives thereof, glycine, leucine, isoleucine, allo-isoleucine, tert-leucine, norleucine, 
phenylalanine, tyrosine, tryptophan, proline, alanine, ornithine, histidine, glutamine, 
valine, threonine, serine, aspartic acid, beta yanoalanine, and allothreonine; alynyl and 
heterocyclylcarbonyl, aminocarbonyl, amido, mono- or dialkylaminocarbonyl, mono- or 
diarylaminocarbonyl, alkylarylaminocarbonyl, diarylaminocarbonyl, mono- or 
diacylaminocarbonyl, aromatic or aliphatic acyl, alkyl optionally substituted by 
substituents selected from amino, carboxy, hydroxy, mercapto, mono- or dialkylamino, 
mono- or diarylamino, alkylarylamino, diarylamino, mono- or diacylamino, alkoxy, 
alkenoxy, aryloxy, thioalkoxy, thioalkenoxy, thioalkynoxy, thioaryloxy and heterocyclyl. 

28. A compound according to claim 25 having the formula: 

T 4 
I 

Amide 
I 

O <CH 2 ) C 
R' ° 

wherein 

G is (CH 2 ) W wherein a hydrogen on one and only one of the CH 2 groups 
of each G is replaced with-(CH 2 ) c -Amide-T 4 ; 
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V and V are organic moieties of the formula C^O^P,.^^, 
wherem the sum of a, P and 5 is sufficient to satisfy the otherwise unsatisfied valencies 
of the C, N, O, S and P atoms; 

O O 
II II 
Amide is — N — C — or — C-N — ; 

R 1 is hydrogen or C M0 alkyl; 

c is an integer ranging from 0 to 4; 

X is defined according to claim 1 ; and 

n is an integer ranging from 1 to 50 such that when n is greater than 1, G 
c, Amide, R 1 and T 4 are independently selected. 

29. A compound according to claim 28 having the formula: 

T 4 
I 

Amide 
I 



O ^H 2 ) c ^ Q 



R 1 ° (CH 2 ) C 
Amide 



whe re ,„ T' ts an organic moiety of the formula C^O^P,,^, 
wherem the sum of a. P and 5 is sufficient ,„ satisfy the otherwise unsatisfied va.encies 
of me C, N, O. S and P aroms; and T> indudes a tertiary or quatema^ amine or an 
organic acid; and m is an integer ranging from 0-49. 



30. 



A compound according to claim 28 having the formula: 



tNSDOClD <WO 9727331 A2_IA> 
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T 4 



Amide 



T 2 



O (CH 2 ) C R 
II I I 

4« 




R' O 



Amide 



wherein T 5 is an organic moiety of the formula C|. 2 5N 0 . 9 O 0 . 9 S 0 .,P 0 .jH a FpIj 



wherein the sum of a, P and 6 is sufficient to satisfy the otherwise unsatisfied valencies 
of the C, N, O, S and P atoms; and T s includes a tertiary or quaternary amine or an 
organic acid; and m is an integer ranging from 0-49. 

31. A compound according to any one of claims 29 and 30 
wherein -Amide-T 5 is selected from: 



NHC 




—NHC- 




O— (C 2 — C, 0 )-N(C,— C 10 ) 2 




— NHC— (C,— C, 
II 

O 



» } -0 ; 



— NHC— (C o -C 10 ) 




and 




32. A compound according to any of claims 29 and 30 wherein 
-Amide-T 5 is selected from: 
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N 



-CNH-CC-C,")-/ V -<?NH-(C,-C, 0 >-/~^ N 




o o 

-CNH-(C,-C I0 )-^ . -CNH-(C 2 -C I0 )-N^O ; 



5i~- c «o) C,— C 10 ) 



— CNH-(C 2 -C I0 >-N^J> ; CNH-CC-C.o 




-CNH-CC.-Co^NCC-C.o), ; -CNH-(C 2 -C I0 )-/~J . 



N(C,— C ]0 ) ; and 





33. A compound according to any one of claims 25-29 wherein T 2 has 
the structure which results when one of the following organic acids is condensed with an 
amine group to form T>-C(=0)-N(R-)-: Formic acid, Acetic acid, Propiolic acid 
Prop.omc acid, Fluoroacetic acid, 2-Butynoic acid, Cyclopropanecarboxylic acid, Butyric 
ac,d, Methoxyacetic acid, Difluoroacetic acid, 4-Pentynoic acid, Cyclobutanecarboxylic 
ac,d,.3,3-DimethylacryIic acid, Valeric acid, N,N-Dimethylglyci„e, N-Formyl-Gly-OH 
Ethoxyacetic acid, (Methylthio)acetic acid, Pyrro.e-2-carboxylic acid, 3-Furoic acid 
lsoxazole-5-carboxylic acid, trans-3-Hexenoic acid, Trifluoroacetic acid, Hexanoic acid 
Ac-Gly-OH, 2-Hydroxy-2-methylbutyric acid, Benzoic acid, Nicotinic acid 2- 
Pyrazinecarboxylic acid, l-Methyl-2-pyrrolecarboxylic acid, 2-Cyclopentene-l -acetic 
add, Cyclopentylacetic acid, (S H -)-2-Pyrrolidone-5-carboxylic acid, N-Methyl-L- 
proline, Heptanoic acid, Ac-b-Ala-OH, 2-Ethyl-2-hydroxybutyric acid 2-(2- 
Methoxyethoxy)acetic acid, p-Toluic acid, 6-Methy.nicotinic acid, 5-Methyl-2- 
pyrazmecarboxylic acid, 2,5-Dimethylpyrrole-3-carboxylic acid, 4-F«uorobenzoic acid 
3,5-Dunemyli SO xazole-4-carboxyIic acid, 3-Cyclopentyl P ropionic acid, Octanoic acid 
N,N-Dxmethylsuccinamic acid, Phenylpropiolic acid, Cinnamic acid, 4-Ethyl benzoic 
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acid, p-Anisic acid, U,5-Trimethylpynole-3-carboxylic acid, 3-Fluoro-4-methylb e nzoic 
acid, Ac-DL-Propargylglycine, 3-(Trifluorom e thyl)butyric acid, 1 -Piperidinepropionic 
acid, N-Acetylproline, 3,5-Difluorobenzoic acid, Ac-L-Val-OH, Indole-2-carboxylic acid, 

2- Benzofurancarboxylic acid, Benzotriazole-5-carboxylic acid, 4-n-Propylbenzoic acid, 

3- Dimethylaminobenzoic acid, 4-Ethoxybenzoic acid, 4.(Methylthio)benzoic acid, N-(2- 
Furoyl)glycine, 2-(Methylthio)nicotin«c acid, 3-Fluoro-4-methoxybenzoic acid, Tfa-GIy- 
OH, 2-Napthoic acid, Quinaldic acid, Ac-L-Ile-OH, 3-Methylindene-2-carboxylic acid 2- 
Quinoxalinecarboxylic acid, 1 -Methylindole-2-carboxylic acid, 2,3,6-Trifluorobenzoic 
acid, N-Formyl-L-Met-OH, 2-[2-(2-Methoxyethoxy)ethoxy]acetic acid, 4-n-Butylbenzoic 
acid, N-Benzoylglycine, 5-Fluoroindole-2-carboxylic acid, 4-n-Propoxybenzoic acid, 4- 
Acetyl-3,5-dimethyl-2-pyrrolecarboxylic acid, 3, 5-Dimethoxy benzoic acid, 2,6- 
Dimethoxynicotinic acid, Cyclohexanepentanoic acid, 2-Naphthylacetic acid, 4-(lH- 
PyrroM-yl)benzoic acid, Indole-3-propionic acid, m-Trifluoromethylbenzoic acid, 5- 
Methoxyindole-2-carboxylic acid, 4-PentyIbenzoic acid, Bz-b-Ala-OH, 4- 
Diethylaminobenzoic acid, 4-n-Butoxybenzoic acid, 3-Methyl-5-CF3-isoxazole-4- 
carboxylic acid, (3,4-Dimethoxyphenyl)acetic acid, 4-Biphenylcarboxylic acid, Pivaloyl- 
Pro-OH, Octanoyl-Gly-OH, (2-Naphthoxy)acetic acid, Indole-3 -butyric acid, 4- 
arifluoromethyDphenylacetic acid, 5-Methoxyindole-3-acetic acid, 4- 
(Trifluoromethoxy)benzoic acid, Ac-L-Phe-OH, 4-Pentyloxybenzoic acid, Z-Gly-OH, 4- 
Carboxy-N-(fur-2-ylmethyl)pyrrolidin-2-one, 3,4-Diethoxybenzoic acid, 2,4-Dimethyl-5- 
C0 2 Et-pyrrole-3-carboxylic acid, N-(2-Fluorophenyl)succinamic acid, 3,4,5- 
Trimethoxybenzoic acid, N-Phenylanthranilic acid, 3-Phenoxybenzoic acid, Nonanoyl- 
Gly-OH, 2-Phenoxypyridine-3-carboxylic acid, 2,5-Dimethyl-l-phenylpyirole-3- 
carboxylic acid, trans-4-(Trifluoromethyl)cinnamic acid, (5-Methyl-2-phenyloxazol-4- 
yl)acetic acid, 4-(2-Cyclohexenyloxy)benzoic acid, 5-Methoxy-2-methylindole-3-acetic 
acid, trans-4-Cotininecarboxylic acid, Bz-5-Aminovaleric acid, 4-Hexyloxybenzoic acid, 
N-(3-Methoxyphenyl)succinamic acid, Z-Sar-OH, 4-(3,4-Dimethoxyphenyl)butyric acid, 
Ac-o-Fluoro-DL-Phe-OH, N-(4-Fluorophenyl)glutaramic acid, 4'-EthyI-4- 
biphenylcarboxylic acid, 1,2,3,4-Tetrahydroacridinecarboxylic acid, 3- 
Phenoxyphenylacetic acid, N-(2,4-Difluorophenyl)succinamic acid, N-Decanoyl-Gly- 
OH, (+)-6-Methoxy-a-methyl-2-naphthaleneacetic acid, 3-(Trifluoromethoxy)cinnamic 
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acid, N-Formy 1-DL-Trp-OH, (R)-(+)-a-Methoxy-a-(trinuoro m ethyl)phenylacetic acid. 
Bz-DL-Leu-OH, 4-(Trifluoromethoxy)phenoxyacetic acid, 4-Heptyloxybenzoic acid, 
2,3,4-Trimethoxycinnamic acid, 2,6-Dimethoxybenzoyl-Gly-OH, 3-(3,4,5- 
Trimethoxyphenyl)propionic acid, 2,3,4,5,6-Pentafluorophenoxyacetic acid, N-(2,4- 
DifluorophenyDglutaramic acid, N-Undecanoyl-Gly-OH, 2-(4-Fluorobenzoyl)ben2oic 
acid, 5-TrifluoromethoxyindoIe-2-carboxylic acid, N-(2,4-Dinuorophenyl)diglycolami C 
acid, Ac-L-Trp^OH, Tfa-L-Phenylglycine-OH, 3-Iodobenzoic acid, 3-(4-n- 
PentylbenzoyDpropiohic acid, 2-Phenyl-4-quinoIinecarboxylic acid, 4-Octyloxybenzoic 
acid, Bz-L-Met-OH, 3,4,5-Triethoxybenzoic acid, N-Lauroyl-Gly-OH, 3,5- 
Bis(trifluoromethyI)benzoic acid, Ac-5-Methyl-DL-Trp-OH, 2-Iodophenylacetic acid, 3- 
Iodo-4-methylbenzoic acid, 3-(4-n-Hexylbenzoyl)propionic acid, N-Hexanoyl-L-Phe-OH, 
4-Nonyloxybenzoic acid, 4'-(Trifluoromethyl)-2-biphenylcarboxylic acid, Bz-L-Phe-OH, 
N-Tridecanoyl-Gly-OH, 3,5-Bis(trifluoromethyl)phenylacetic acid, 3-( 4 - n - 
Heptylbenzoyl)propionic acid, N-Hepytanoyl-L-Phe-OH, 4-Decyloxybenzoic acid, N- 
(a,a,a-trifluoro-m-tolyl)anthranilic acid, Niflumic acid, 4-(2- - 

Hydroxyhexafluoroisopropyl)benzoic acid, N-Myristoyl-Gly-OH, 3-(4- n - 
Octylbenzoyl)propionic acid, N-Octanoyl-L-Phe-OH, 4-Undecyloxybenzoic acid, 3- 
(3,4,5-TrimethoxyphenyDpropionyl-Gly-OH, 8-Iodonaphthoic acid, N-Pentadecanoyl- •■ 
Gly-OH, 4-Dodecyloxybenzoic acid, N-Palmitoyl-Gly-OH, and N-Stearoyl-Gly-OH. 

34. A composition comprising a plurality of compounds of the 

formula: 

T-'-L-MOI 

wherein, 

T» is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a ^-containing moiety to be cleaved 
from the remainder of the compound, wherein the ^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
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subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at 
location other than the 3 ' end of the MOI ; and 

wherein no two compounds have either the same T" 5 or the same MOI. 
35. A composition according to claim 34 wherein the plurality is 

greater than 2. 



36. A composition according to claim 34 wherein the plurality is 

greater than 4. 



37. A composition according to claim 34 wherein the nucleic acid 
fragment has a sequence complementary to a portion of a vector, wherein the fragment is 
capable of priming nucleotide synthesis. 

38. A composition according to claim 34 wherein the T~ groups of 
members of the plurality differ by at least 2 amu. 

39. A composition according to claim 34 wherein the T" groups of 
members of the plurality differ by at least 4 amu. 

40. A composition comprising water and a compound of the formula: 

T-'-L-MOI 

wherein, 

T" 5 is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a T* nt -containing moiety to be cleaved 
from the remainder of the compound, wherein the T'-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
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subjected to mass spectromeuy and is selected from tertiary amine, quaternary amine and 
organic acid; and 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; and 



41. A composition according to claim 40 further comprising buffer, 
having a pH of about 5 to about 9. 

42. A composition according to claim 40 further comprising an 
enzyme and one of dATP, dGTP, dCTP, and dTTP. 



43. A composition according to claim 40 further comprising 
enzyme and one of ddATP, ddGTP, ddCTP, and ddTTP. 



an 



44. A composition comprising a plurality of sets of compounds, each 
set of compounds having the formula: 

T-'-L-MOI 

wherein, 

T- is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a ^-containing moiety to be cleaved 
from the remainder of the compound, wherein the retaining moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; 

wherein within a set, all members have the same 1™ group, and the MOI 
fragments have variable lengths that terminate with the same dideoxynucleotide selected 
from ddAMP. ddGMP, ddCMP and ddTMP; and 
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wherein between sets, the T" 5 groups differ by at least 2 amu. 

45. A composition according to claim 44 wherein the plurality is at 

least 3. 

46. A composition according to claim 44 wherein the plurality is at 

least 5. 

47. A composition comprising a first plurality of sets of compounds 
according to claim 44, and a second plurality of sets of compounds having the formula 

T^-L-MOI 

wherein, 

T" 5 is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is an organic group which allows a T^-containing moiety to be cleaved 
from the remainder of the compound, wherein the T^-containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; 

MOI is a nucleic acid fragme- vherein L is conjugated to the MOI at a 
location other than the 3* end of the MOI; and 

wherein all members within the second plurality have an MOI sequence 
which terminates with the same dideoxynucleotide selected from ddAMP, ddGMP, 
ddCMP and ddTMP; with the proviso that the dideoxynucleotide present in the 
compounds of the first plurality is not the same dideoxynucleotide present in the 
compounds of the second plurality. 

48. A kit for DNA sequencing analysis comprising a plurality of 
container sets, each container set comprising at least five containers, wherein a first 
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container contains a vector, a second, third, fourth and fifth containers contain 
compounds of the formula: 

T-"-L-MOI 

wherein, 

T 1 is an organic group detectable by mass spectrometry, comprising 
carbon, at least one of hydrogen and fluoride, and optional atoms selected from oxygen, 
nitrogen, sulfur, phosphorus and iodine; 

L is ah organic group which allows a T"-containing moiety to be cleaved 
from the remainder of the compound, wherein the ^containing moiety comprises a 
functional group which supports a single ionized charge state when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and 
organic acid; and 

MOI is a nucleic acid fragment wherein L is conjugated to the MOI at a 
location other than the 3' end of the MOI; such that 

the MOI for the second, third, fourth and fifth containers is identical and 
complementary to a portion of the vector within the set of containers, and the T" s group 
within each container is different from the other 1™ groups in the kit. 

49. A kit according to claim 48 wherein the plurality is at least 3 . 

50. A kit according to claim 48 wherein the plurality is at least 5. 

51. A system for determining the sequence of a nucleic acid molecule, 
comprising a separation apparatus that separates tagged nucleic acid fragments, an 
apparatus that cleaves from a tagged nucleic acid fragment a tag which is correlative with 
a particular nucleotide and detectable by electrochemical detection, and an apparatus for 
potentiostatic amperometry. 
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embodiment, the tags are detected by mass spectrometry 
and the sequence of the nucleic acid molecule is determined 
therefrom. The individual steps of the methods can be used 
in automated format, e.g., by the incorporation into systems. 
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